基于VxWorks的檢查點容錯技術研究

發(fā)布時間：2018-11-27 11:00

【摘要】：檢查點技術作為一種普遍的容錯技術，在分布式/集群系統(tǒng)中有著廣泛應用。在基于消息傳遞的系統(tǒng)中，藉由將進程的運行狀態(tài)定期的記錄到可靠存儲設備中（檢查點文件），這樣進程在失效時就可以通過存儲的檢查點文件進行迅速恢復，避免了進程對前期工作的重復執(zhí)行，減少計算損失。協(xié)同檢查點技術作為檢查點技術中的一種，通過對進程的檢查點設置過程進行協(xié)調來保持檢查點集合的全局一致狀態(tài)。通常我們使用容錯開銷來對一項容錯技術進行評價。以故障點為邊界，將容錯開銷分為無故障開銷和故障恢復開銷。協(xié)同檢查點技術憑借其全局一致狀態(tài)在故障恢復開銷上有著較好的性能，但檢查點設置過程中進程間的協(xié)調控制消息增加了系統(tǒng)的無故障開銷。在檢查點的設置上本文提出了一種具有O(n)復雜度的非阻塞協(xié)同檢查點算法，，通過全局共享的消息通道來避免進程的消息接收導致的系統(tǒng)狀態(tài)不一致，并且與傳統(tǒng)阻塞協(xié)同檢查點算法中的雙階段阻塞協(xié)議不一樣的是，本文中采用的是單階段非阻塞方式，借助于全局的共享內存，將協(xié)調消息的復雜度由傳統(tǒng)的O(n2)減少到了O(n)，從而減少了系統(tǒng)的無故障開銷。另外通過非阻塞的方式使得任務在做檢查點設置的過程中不需要阻塞任務的運行以及消息發(fā)送，任務完成檢查點設置后即可處理后續(xù)收到的消息而不用等待，如此提高了系統(tǒng)的處理速度以及實時性能。為了滿足算法的非阻塞性，進程獨立的進行檢查點文件存儲，檢查點設置過程中故障的發(fā)生會使得進程的檢查點文件不一致，為此本文采用了雙檢查點文件來避免這種不一致的發(fā)生。文中的非阻塞方式極大提高了進程的自主性，不過也使系統(tǒng)的檢查點狀態(tài)由強一致性全局狀態(tài)變?yōu)榱巳忠恢聽顟B(tài)，因為此時的檢查點狀態(tài)中可能包含有中途消息，因此在檢查點設置中還需要結合消息日志技術，以此保證系統(tǒng)狀態(tài)的可恢復性。由協(xié)同檢查點算法可知，消息日志只需存儲檢查點設置觸發(fā)點之后的消息，避免了垃圾回收的需要。本文的檢查點容錯方案基于VxWorks嵌入式實時系統(tǒng)，該系統(tǒng)有著良好的可靠性以及實時性。結合該系統(tǒng)本文在容錯方案中對文件存儲、消息傳輸上做了改進。借助磁帶式存儲方案提高了文件的存儲效率，減少了對存儲空間的占用；而借助內存管理，減少數(shù)據(jù)在消息隊列中的拷貝量，并提高了數(shù)據(jù)的傳輸效率。最后通過本文通過三個簡單的試驗，驗證了檢查點容錯方案的可行性。
[Abstract]:As a universal fault-tolerant technology, checkpoint technology is widely used in distributed / cluster systems. In a message-passing based system, the process can be restored quickly by storing the running status of the process in a reliable storage device (checkpoint file) on a regular basis through the stored checkpoint file. Avoids the process to the previous work duplicate execution, reduces the computation loss. As one of the checkpoint techniques, the cooperative checkpoint technology maintains the global consistent state of the checkpoint set by coordinating the process of checkpoint setting. Generally, we evaluate a fault-tolerant technique using fault-tolerant overhead. The fault-tolerant overhead is divided into fault-free overhead and fault recovery cost with fault point as the boundary. Cooperative checkpoint technology has good performance in fault recovery overhead by virtue of its globally consistent state, but the coordinated control messages between processes in the process of checkpoint setting increase the fault-free overhead of the system. In this paper, a non-blocking cooperative checkpoint algorithm with O (n) complexity is proposed to avoid the system state inconsistency caused by the message reception of the process through a globally shared message channel. And different from the two-stage blocking protocol in the traditional blocking cooperative checkpoint algorithm, the single-stage non-blocking method is used in this paper, with the help of global shared memory. The complexity of coordinating messages is reduced from the traditional O (N2) to O (n), thus reducing the system's failure free overhead. In addition, the task does not need to block the running of the task and the message sending in the process of setting up the checkpoint by non-blocking method. After the task completes the checkpoint setting, it can process the messages received after the completion of the checkpoint without having to wait. In this way, the processing speed and real-time performance of the system are improved. In order to satisfy the non-obstructive algorithm, the process stores the checkpoint file independently, and the fault in the process of checkpoint setting will make the process's checkpoint file inconsistent. In this paper, double checkpoint files are used to avoid this inconsistency. The nonblocking mode in this paper greatly improves the autonomy of the process, but also changes the checkpoint state of the system from a strongly consistent global state to a globally consistent state, because the checkpoint state may contain a halfway message. Therefore, it is necessary to combine message logging technology in checkpoint setting to ensure the recovery of system state. From the cooperative checkpoint algorithm, the message log only needs to store the message after the checkpoint set up the trigger point, which avoids the need of garbage collection. The fault tolerance scheme of checkpoint in this paper is based on VxWorks embedded real-time system, which has good reliability and real-time. Combined with this system, this paper improves the file storage and message transmission in the fault-tolerant scheme. With the aid of the tape storage scheme, the storage efficiency of files is improved and the storage space is reduced, while the copy amount of data in message queue is reduced and the data transmission efficiency is improved with the aid of memory management. Finally, through three simple experiments in this paper, the feasibility of the checkpoint fault tolerance scheme is verified.
【學位授予單位】：吉林大學
【學位級別】：碩士
【學位授予年份】：2014
【分類號】：TP302.8

【參考文獻】

相關期刊論文前4條

1 鄢喜愛;楊金民;田華;;雙機容錯系統(tǒng)中最佳檢查點間隔的分析[J];計算機工程;2007年05期

2 萬國偉;盧宇彤;謝e

本文編號：2360503

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2360503.html

上一篇：基于虛擬化機制的新型機房研究與構建
下一篇：大數(shù)據(jù)下MongoDB數(shù)據(jù)庫檔案文檔存儲去重研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于VxWorks的檢查點容錯技術研究