容錯(cuò)檢查點(diǎn)算法研究和軟件設(shè)計(jì)
發(fā)布時(shí)間:2018-02-26 04:23
本文關(guān)鍵詞: 容錯(cuò) 不可靠非FIFO信道 一致性全局檢查點(diǎn) Windows檢查點(diǎn) 出處:《山東大學(xué)》2012年碩士論文 論文類型:學(xué)位論文
【摘要】:近年來(lái),越來(lái)越多的分布式系統(tǒng)被各行各業(yè)使用,如軍事、航空、金融系統(tǒng)等行業(yè)。隨著為分布式系統(tǒng)設(shè)計(jì)的分布式軟件的復(fù)雜度的增加,分布式系統(tǒng)中節(jié)點(diǎn)數(shù)量的增多,導(dǎo)致分布式系統(tǒng)有越來(lái)越高的概率發(fā)生故障,從而造成系統(tǒng)可靠性越來(lái)越差。若是在使用過(guò)程中出現(xiàn)故障,并且沒(méi)有相應(yīng)的保護(hù)措施,這些故障有可能會(huì)造成生命、財(cái)產(chǎn)的重大損失。因此研究容錯(cuò)檢查點(diǎn)技術(shù)就有十分重要的現(xiàn)實(shí)意義。 本課題是基于山東省自然科學(xué)基金項(xiàng)目“基于后向恢復(fù)的異構(gòu)分布式系統(tǒng)容錯(cuò)技術(shù)的研究與實(shí)現(xiàn)”提出的。在本文中首先敘述了現(xiàn)如今檢查點(diǎn)技術(shù)的研究意義及發(fā)展現(xiàn)狀,介紹了分布式系統(tǒng)的基本故障模型以及基本容錯(cuò)構(gòu)件。提出了一個(gè)基于不可靠的非FIFO通信信道的檢查點(diǎn)算法,在不可靠的非FIF0的通信信道中,系統(tǒng)會(huì)發(fā)生報(bào)文丟失、重復(fù)接收?qǐng)?bào)文和報(bào)文亂序。進(jìn)程可能由于報(bào)文丟失會(huì)導(dǎo)致一些報(bào)文不被計(jì)算,可能由于重復(fù)接收?qǐng)?bào)文導(dǎo)致一些消息被多次計(jì)算,也可能由于消息亂序?qū)е乱恍﹫?bào)文不能按照其發(fā)送順序進(jìn)行計(jì)算,以上提到的問(wèn)題會(huì)導(dǎo)致系統(tǒng)產(chǎn)生不正確的計(jì)算結(jié)果,從而無(wú)法使各進(jìn)程設(shè)置一致性的檢查點(diǎn)。我們的算法通過(guò)給每個(gè)報(bào)文分配一個(gè)序列號(hào)來(lái)解決上面提到的問(wèn)題。在檢查點(diǎn)設(shè)置過(guò)程中,一致性檢查點(diǎn)通過(guò)發(fā)送消息序號(hào)與接收消息序號(hào)來(lái)決定。通過(guò)檢測(cè)發(fā)送消息序號(hào)和接收消息序號(hào)來(lái)標(biāo)識(shí)丟失消息、重復(fù)接收的報(bào)文和亂序報(bào)文。我們要重發(fā)丟失的消息,保存亂序消息和丟棄重復(fù)接收的報(bào)文來(lái)解決以上的問(wèn)題。我們的算法能夠使系統(tǒng)設(shè)置一致性的全局檢查點(diǎn)。本文還敘述了Windows進(jìn)程檢查點(diǎn)的設(shè)置和恢復(fù),分為用戶地址空間和內(nèi)核對(duì)象的保存和恢復(fù),使用Visual Studio2005環(huán)境模擬了進(jìn)程的檢查點(diǎn)設(shè)置和恢復(fù)。
[Abstract]:In recent years, more and more distributed systems have been used in various industries, such as military, aviation, financial systems, etc. With the increasing complexity of distributed software designed for distributed systems, the number of nodes in distributed systems has increased. Causes a higher and higher probability of failure in a distributed system, resulting in a worsening of system reliability. If failure occurs during use and without appropriate protection measures, these failures may lead to life. Therefore, it is very important to study fault-tolerant checkpoint technology. This paper is based on Shandong Natural Science Foundation project "Research and implementation of fault tolerance technology for heterogeneous distributed systems based on backward recovery". In this paper, the significance and development of checkpoint technology are first described. This paper introduces the basic fault model and fault-tolerant components of distributed system, and proposes a checkpoint algorithm based on unreliable non-#en0# communication channel. In the unreliable non-#en1# communication channel, the system will lose the message. The process may cause some messages not to be calculated because of the loss of the message, or some messages may be calculated several times because of the repeated receipt of the message. It is also possible that some messages cannot be calculated in the order in which they are sent because of the disorder of messages. The problems mentioned above may lead to incorrect calculation results in the system. Our algorithm solves the problem mentioned above by assigning a sequence number to each message. The consistency checkpoint is determined by sending the sequence number of the message and the serial number of the received message. The missing message is identified by detecting the sequence number of the sent message and the sequence number of the received message, and the received message and scrambled message are repeated. We want to retransmit the lost message, The algorithm can make the system set up consistent global checkpoint. This paper also describes the setup and recovery of Windows process checkpoint. It is divided into user address space and kernel object save and restore, and use Visual Studio2005 environment to simulate the process checkpoint setting and recovery.
【學(xué)位授予單位】:山東大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類號(hào)】:TP302.8
【參考文獻(xiàn)】
相關(guān)期刊論文 前3條
1 李凱原,楊孝宗;減少檢查點(diǎn)開(kāi)銷的一種方法[J];計(jì)算機(jī)工程與應(yīng)用;2000年02期
2 羅元盛,閔應(yīng)驊,張大方;一種基于索引的準(zhǔn)同步檢查點(diǎn)協(xié)議[J];計(jì)算機(jī)學(xué)報(bào);2005年10期
3 汪東升,邵明瓏;具有O(n)消息復(fù)雜度的協(xié)調(diào)檢查點(diǎn)設(shè)置算法[J];軟件學(xué)報(bào);2003年01期
,本文編號(hào):1536506
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1536506.html
最近更新
教材專著