面向云平臺(tái)的協(xié)同卷回恢復(fù)關(guān)鍵技術(shù)研究
本文選題:容錯(cuò) + 云計(jì)算 ; 參考:《哈爾濱工業(yè)大學(xué)》2014年碩士論文
【摘要】:云計(jì)算在繼承傳統(tǒng)技術(shù)的基礎(chǔ)上加入新的思路,,通過(guò)使用群集的商業(yè)計(jì)算機(jī)來(lái)處理大量的數(shù)據(jù),正成為一種流行的計(jì)算模式。但云計(jì)算系統(tǒng)的容錯(cuò)能力也日益成為瓶頸,亟需提高系統(tǒng)的容錯(cuò)能力。 課題涉及的卷回恢復(fù)技術(shù)并非新的技術(shù),包括協(xié)同檢查點(diǎn)和消息日志,均已得到較為廣泛的應(yīng)用。但這些容錯(cuò)技術(shù)面對(duì)云計(jì)算仍顯不足,大多只針對(duì)云平臺(tái)的虛擬機(jī)實(shí)例提供容錯(cuò)能力。因此,對(duì)云計(jì)算的卷回恢復(fù)容錯(cuò)技術(shù)進(jìn)行研究,以提供云計(jì)算環(huán)境下系統(tǒng)全局容錯(cuò)能力。 本文實(shí)現(xiàn)的云平臺(tái)協(xié)同卷回恢復(fù)系統(tǒng)周期性地設(shè)置半?yún)f(xié)同檢查點(diǎn),通過(guò)對(duì)各虛擬機(jī)進(jìn)行協(xié)同同步避免孤兒消息,并利用消息驅(qū)趕協(xié)議消除中途消息,完成全局一致地檢查點(diǎn)設(shè)置。云平臺(tái)虛擬機(jī)發(fā)生錯(cuò)誤后,快速地檢測(cè)到錯(cuò)誤,執(zhí)行云平臺(tái)卷回恢復(fù)。一般情況下,云平臺(tái)分配給不同用戶的虛擬機(jī)實(shí)例間是相互獨(dú)立的,出錯(cuò)后回卷恢復(fù)所有虛擬機(jī)實(shí)例可能導(dǎo)致大量無(wú)謂的計(jì)算損失。為了減少參與卷回的虛擬機(jī)數(shù)量,本文提出基于日志的協(xié)同檢查點(diǎn)算法,當(dāng)某虛擬機(jī)發(fā)生錯(cuò)誤只恢復(fù)與其存在依賴關(guān)系的虛擬機(jī)。區(qū)別于傳統(tǒng)容錯(cuò)技術(shù),本文實(shí)現(xiàn)容錯(cuò)平臺(tái)對(duì)具體應(yīng)用和操作系統(tǒng)透明,除云平臺(tái)管理服務(wù)器端控制模塊外所有功能模塊均在虛擬機(jī)特權(quán)域中實(shí)現(xiàn),無(wú)需修改應(yīng)用軟件和操作系統(tǒng)。 在研究比較各類云平臺(tái)的基礎(chǔ)上,選擇開源軟件CloudStack和XenServer搭建小型IaaS云平臺(tái),對(duì)設(shè)計(jì)和開發(fā)的協(xié)同卷回恢復(fù)系統(tǒng)進(jìn)行測(cè)試。測(cè)試結(jié)果表明,相關(guān)協(xié)同卷回恢復(fù)算法在為云平臺(tái)提供容錯(cuò)能力的同時(shí),半?yún)f(xié)同檢查點(diǎn)降低用戶等待時(shí)間,而基于日志的協(xié)同卷回恢復(fù)算法則減少了參與回卷的虛擬機(jī)數(shù)量。
[Abstract]:Cloud computing is adding new ideas on the basis of inheriting traditional technology. It is becoming a popular computing model by using a cluster of commercial computers to deal with a large number of data. But the fault tolerance ability of the cloud computing system is also becoming a bottleneck, and it is urgent to improve the fault tolerance of the system.
The technology of rollback recovery is not a new technology, including cooperative checkpoints and message logs, which have been widely used. However, these fault-tolerant technologies are still inadequate in the face of cloud computing. Most of them only provide fault tolerance for the virtual machine instances of the cloud platform. The overall fault tolerance of the system in the cloud computing environment.
In this paper, a semi cooperative checkpoint is set periodically for the cloud platform cooperative rollback recovery system. By synergetic synchronization of each virtual machine, the orphan message is avoided, and the message drive protocol is used to eliminate the halfway message and complete the global consistent checkpoint setting. After the cloud platform virtual machine has made a mistake, it detects the error quickly and executes the cloud flat. In general, the cloud platform is independent of the virtual machine instances allocated to different users, and the recovery of all virtual machine instances after the error can lead to a large number of meaningless computing losses. In order to reduce the number of virtual machines involved in the rollback, this paper proposes a cooperative checkpoint algorithm based on the daily chronicles, when a virtual machine occurs. The error only restored to the virtual machine which depended on its existence. Unlike the traditional fault-tolerant technology, the fault tolerant platform is transparent to the specific application and operating system. All functional modules except the cloud platform management server end control module are implemented in the virtual machine privileges domain without the need to repair the application software and operating system.
On the basis of comparing various cloud platforms, we choose open source software CloudStack and XenServer to build a small IaaS cloud platform to test the collaborative rollback recovery system designed and developed. The results show that the related cooperative rollback recovery algorithm provides fault tolerance for the cloud platform, while the semi cooperative checkpoint reduces the waiting time for users. In addition, log based collaborative rollback recovery algorithm reduces the number of virtual machines involved in the rollback.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類號(hào)】:TP393.09
【參考文獻(xiàn)】
相關(guān)期刊論文 前6條
1 張悠慧,汪東升,鄭緯民;工作站機(jī)群系統(tǒng)自動(dòng)重構(gòu)機(jī)制[J];電子學(xué)報(bào);2000年05期
2 苑野;傘曉嬌;;云計(jì)算與網(wǎng)格計(jì)算比較研究[J];哈爾濱商業(yè)大學(xué)學(xué)報(bào)(自然科學(xué)版);2012年02期
3 魏曉輝,鞠九濱;分布式系統(tǒng)中的檢查點(diǎn)算法[J];計(jì)算機(jī)學(xué)報(bào);1998年04期
4 馬博;袁丁;;Linux下的高流量數(shù)據(jù)包監(jiān)聽(tīng)技術(shù)[J];計(jì)算機(jī)應(yīng)用;2009年05期
5 馬曉亭;陳臣;;基于云服務(wù)模式分析的數(shù)字圖書館云服務(wù)平臺(tái)設(shè)計(jì)與實(shí)現(xiàn)[J];圖書館理論與實(shí)踐;2013年06期
6 ;VMckpt:lightweight and live virtual machine checkpointing[J];Science China(Information Sciences);2012年12期
相關(guān)博士學(xué)位論文 前3條
1 劉海坤;虛擬機(jī)在線遷移性能優(yōu)化關(guān)鍵技術(shù)研究[D];華中科技大學(xué);2012年
2 李海山;面向恢復(fù)的容錯(cuò)計(jì)算技術(shù)研究[D];哈爾濱工程大學(xué);2007年
3 張展;移動(dòng)計(jì)算環(huán)境下卷回恢復(fù)技術(shù)的研究[D];哈爾濱工業(yè)大學(xué);2008年
本文編號(hào):2001291
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2001291.html