基于依賴跟蹤和消息計數(shù)的回卷恢復(fù)容錯技術(shù)研究
[Abstract]:At present, a large number of scientific research and engineering applications are carried out in distributed computing systems. However, with the expansion of system scale and the increase of the number of nodes, the probability of system failure is also increased. If the system is to be able to guarantee the correctness of the results or meet the requirements of the application after the fault or exception occurs, the system must have the fault-tolerant ability. Roll-back recovery fault-tolerant technology, which is based on time redundancy and does not require node redundancy, is the mainstream technology to achieve high-performance distributed computing reliability. However, roll-back recovery technology can not only guarantee the reliability of the system but also bring a lot of additional overhead, which limits its application and development to a great extent. Therefore, it is of great significance to study the methods to reduce the overhead of rollback recovery protocol and improve the efficiency of system execution. The main contents of this paper include the following two aspects: firstly, a lightweight message log protocol based on dependency tracing is proposed to solve the problem of large message log overhead caused by synchronization constraints in traditional message logging protocols. This protocol takes advantage of the message-passing characteristic of runtime and uses the information-attached policy to remove the synchronization constraint in message log. In this protocol, the message data is stored in the sender without any constraints, and the message submission information is stored in the dependent party with the message transmission in the dependency extension, and no constraints are introduced in this way. The message submission information is tracked by the depositor, which avoids unnecessary transmission, reduces the incidental information of the message, and has the characteristics of lightweight. The experimental results show that the message log overhead and checkpoint overhead of the proposed protocol are reduced by about 10% compared with the Egida protocol. Secondly, a non-blocking cooperative checkpoint protocol based on message counting is proposed to solve the problem that the existing cooperative checkpoint protocols usually have blocking or high cooperative overhead. The protocol divides the run-time state of the process into three types. Using the characteristics of distributed parallel program runtime checkpoint setting probability far higher than the probability of failure occurrence, this protocol adopts information collateral policy and non-blocking execution mechanism. "transfers part of the collaboration overhead during checkpoint setup to the post-failure rollback recovery phase, while avoiding unnecessary checkpoints by identifying the traffic of processes within the checkpoint interval." This reduces the overall overhead during checkpoint setup. The experimental results show that compared with the two-segment checkpoint protocol, the cooperative checkpoint overhead of the proposed protocol reduces by 20% to 40%, and that of the distributed snapshot protocol by about 20%.
【學(xué)位授予單位】:湖南大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:TP302.7
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 裴丹,汪東升,沈美明,鄭緯民;WOB:一種新的文件檢查點設(shè)置策略[J];電子學(xué)報;2000年05期
2 劉云生,張傳富,張童,查亞兵,黃柯棣;基于Markov鏈的分布式仿真系統(tǒng)最佳檢查點間隔研究[J];國防科技大學(xué)學(xué)報;2005年05期
3 張展;左德承;慈軼為;楊孝宗;;穿戴計算機(jī)的內(nèi)核級檢查點優(yōu)化策略研究[J];高技術(shù)通訊;2008年05期
4 劉建,汪東升,沈美明,鄭緯民;一種基于檢查點的并行程序調(diào)試器的設(shè)計與實現(xiàn)[J];計算機(jī)研究與發(fā)展;2002年12期
5 周恩強(qiáng),盧宇彤,沈志宇;一個適合大規(guī)模集群并行計算的檢查點系統(tǒng)[J];計算機(jī)研究與發(fā)展;2005年06期
6 張展;左德承;慈軼為;楊孝宗;;一種基于移動計算環(huán)境的因果日志卷回恢復(fù)算法[J];計算機(jī)研究與發(fā)展;2008年02期
7 羅元盛,閔應(yīng)驊,張大方;一種基于索引的準(zhǔn)同步檢查點協(xié)議[J];計算機(jī)學(xué)報;2005年10期
8 汪東升,邵明瓏;具有O(n)消息復(fù)雜度的協(xié)調(diào)檢查點設(shè)置算法[J];軟件學(xué)報;2003年01期
9 汪東升,沈美明,鄭緯民,裴丹;一種基于檢查點的卷回恢復(fù)與進(jìn)程遷移系統(tǒng)[J];軟件學(xué)報;1999年01期
10 富弘毅;丁滟;宋偉;楊學(xué)軍;;一種利用并行復(fù)算實現(xiàn)的OpenMP容錯機(jī)制[J];軟件學(xué)報;2012年02期
本文編號:2455119
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2455119.html