基于檢查點(diǎn)機(jī)制的系統(tǒng)性能優(yōu)化技術(shù)研究
發(fā)布時(shí)間:2018-12-28 18:56
【摘要】:當(dāng)今社會(huì),計(jì)算機(jī)系統(tǒng)被廣泛應(yīng)用在交通運(yùn)輸、醫(yī)學(xué)、航海、航空等各個(gè)領(lǐng)域,人們對(duì)計(jì)算機(jī)可靠性提出了越來(lái)越高的要求。事實(shí)上,軟硬件本身的特性決定了系統(tǒng)完全不發(fā)生失效是不可能的?紤]一個(gè)需要長(zhǎng)時(shí)間運(yùn)行的任務(wù),在執(zhí)行過(guò)程中,如果發(fā)生故障,那么不得不重新開(kāi)始執(zhí)行,這就造成了不必要的浪費(fèi)。因此,能夠在故障發(fā)生時(shí)容忍故障就顯得尤為重要了。檢查點(diǎn)技術(shù)就是這樣一種有效的容錯(cuò)手段,被廣泛用在計(jì)算機(jī)、數(shù)據(jù)庫(kù)系統(tǒng)中,旨在提高系統(tǒng)可靠性。通過(guò)在任務(wù)運(yùn)行過(guò)程中每隔一段時(shí)間設(shè)置檢查點(diǎn),避免失效發(fā)生時(shí),大量的計(jì)算內(nèi)容被丟失,改善系統(tǒng)的性能。 針對(duì)1層恢復(fù)方案檢查點(diǎn)設(shè)置開(kāi)銷(xiāo)較大的問(wèn)題,Vaidya提出了所謂的2層恢復(fù)方案,旨在降低在任務(wù)運(yùn)行過(guò)程中的檢查點(diǎn)設(shè)置開(kāi)銷(xiāo)。在2層恢復(fù)方案中,存在設(shè)置開(kāi)銷(xiāo)不同的2種類(lèi)型的檢查點(diǎn),即N-checkpoint和local checkpoint,分別被保存在遠(yuǎn)端存儲(chǔ)器和本地磁盤(pán)中。設(shè)置一個(gè)local checkpoint的花銷(xiāo)要低于設(shè)置N-checkpoint的開(kāi)銷(xiāo)。為了實(shí)現(xiàn)優(yōu)化的性能,Vaidya通過(guò)數(shù)值搜索給出了指數(shù)失效分布下的檢查點(diǎn)放置策略。 本文提出一種新的2層檢查點(diǎn)放置策略,確定整個(gè)系統(tǒng)運(yùn)行過(guò)程中l(wèi)ocalcheckpoint和N-checkpoint放置。該放置策略不僅適用于故障分布服從指數(shù)分布的情形,也能適用于更復(fù)雜的分布類(lèi)型,如weibull分布。結(jié)果表明,本文給出的策略能獲得較好的性能。同時(shí),本文分析了影響相鄰N-checkpoint之間最優(yōu)localcheckpoint數(shù)目的因素。結(jié)果表明2種類(lèi)型檢查點(diǎn)的設(shè)置開(kāi)銷(xiāo)之比和2種失效發(fā)生的概率比是影響其的因素。
[Abstract]:Nowadays, computer system is widely used in transportation, medicine, navigation, aviation and so on. In fact, the characteristics of the software and hardware itself make it impossible for the system to fail completely. Consider a task that needs to run for a long time. In the course of execution, if there is a failure, then we have to start execution again, which will cause unnecessary waste. Therefore, it is particularly important to be able to tolerate faults when they occur. Checkpoint technology is such an effective fault-tolerant method, widely used in computer, database systems, aimed at improving system reliability. In order to avoid the loss of a large amount of computing content and improve the performance of the system, the checkpoint is set every once in a while during the operation of the task to avoid the loss of a large amount of computing content when the failure occurs. Aiming at the problem of high overhead of checkpoint setting in layer 1 recovery scheme, Vaidya proposes a so-called two-layer recovery scheme, which aims to reduce the overhead of checkpoint setting in the course of task running. In the two-layer recovery scheme, there are two types of checkpoints with different setup overhead, that is, N-checkpoint and local checkpoint, are stored in remote memory and local disk, respectively. Setting up a local checkpoint costs less than setting up a N-checkpoint. In order to achieve optimal performance, Vaidya gives a checkpoint placement strategy under exponential failure distribution by numerical search. In this paper, a new two-layer checkpoint placement strategy is proposed to determine the placement of localcheckpoint and N-checkpoint in the whole system. The placement strategy is not only suitable for fault distribution with exponential distribution, but also for more complex distribution types, such as weibull distribution. The results show that the proposed strategy can achieve better performance. At the same time, the factors influencing the optimal number of localcheckpoint between adjacent N-checkpoint are analyzed. The results show that the ratio of setting overhead of two types of checkpoints and the probability ratio of two kinds of failure are the factors affecting them.
【學(xué)位授予單位】:西安電子科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類(lèi)號(hào)】:TP302.8
本文編號(hào):2394304
[Abstract]:Nowadays, computer system is widely used in transportation, medicine, navigation, aviation and so on. In fact, the characteristics of the software and hardware itself make it impossible for the system to fail completely. Consider a task that needs to run for a long time. In the course of execution, if there is a failure, then we have to start execution again, which will cause unnecessary waste. Therefore, it is particularly important to be able to tolerate faults when they occur. Checkpoint technology is such an effective fault-tolerant method, widely used in computer, database systems, aimed at improving system reliability. In order to avoid the loss of a large amount of computing content and improve the performance of the system, the checkpoint is set every once in a while during the operation of the task to avoid the loss of a large amount of computing content when the failure occurs. Aiming at the problem of high overhead of checkpoint setting in layer 1 recovery scheme, Vaidya proposes a so-called two-layer recovery scheme, which aims to reduce the overhead of checkpoint setting in the course of task running. In the two-layer recovery scheme, there are two types of checkpoints with different setup overhead, that is, N-checkpoint and local checkpoint, are stored in remote memory and local disk, respectively. Setting up a local checkpoint costs less than setting up a N-checkpoint. In order to achieve optimal performance, Vaidya gives a checkpoint placement strategy under exponential failure distribution by numerical search. In this paper, a new two-layer checkpoint placement strategy is proposed to determine the placement of localcheckpoint and N-checkpoint in the whole system. The placement strategy is not only suitable for fault distribution with exponential distribution, but also for more complex distribution types, such as weibull distribution. The results show that the proposed strategy can achieve better performance. At the same time, the factors influencing the optimal number of localcheckpoint between adjacent N-checkpoint are analyzed. The results show that the ratio of setting overhead of two types of checkpoints and the probability ratio of two kinds of failure are the factors affecting them.
【學(xué)位授予單位】:西安電子科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類(lèi)號(hào)】:TP302.8
【參考文獻(xiàn)】
中國(guó)期刊全文數(shù)據(jù)庫(kù) 前4條
1 李凱原,楊孝宗;減少檢查點(diǎn)開(kāi)銷(xiāo)的一種方法[J];計(jì)算機(jī)工程與應(yīng)用;2000年02期
2 魏曉輝,鞠九濱;分布式系統(tǒng)中的檢查點(diǎn)算法[J];計(jì)算機(jī)學(xué)報(bào);1998年04期
3 范新媛,徐國(guó)治,應(yīng)忍冬;基于檢查點(diǎn)和Rejuvenation的軟件可靠性建模分析[J];系統(tǒng)仿真學(xué)報(bào);2003年11期
4 印杰;江建慧;;復(fù)雜失效分布下的動(dòng)態(tài)檢查點(diǎn)設(shè)置[J];小型微型計(jì)算機(jī)系統(tǒng);2010年04期
,本文編號(hào):2394304
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2394304.html
最近更新
教材專(zhuān)著