MapReduce故障恢復(fù)機(jī)制設(shè)計(jì)與實(shí)現(xiàn)
本文選題:云計(jì)算 + MapReduce。 參考:《華中科技大學(xué)》2012年碩士論文
【摘要】:隨著大規(guī)模數(shù)據(jù)運(yùn)算的不斷發(fā)展,運(yùn)算集群的規(guī)模越來越大,對(duì)系統(tǒng)可靠性的要求也越來越高。然而,對(duì)于如此大規(guī)模的集群,不可避免的存在著各種各樣的故障發(fā)生。在MapReduce作業(yè)的運(yùn)算過程中,集群上任務(wù)故障和節(jié)點(diǎn)故障更是十分普遍。然而,MapReduce現(xiàn)有的故障處理方式存在著一些缺陷。因此,對(duì)MapReduce計(jì)算模型故障恢復(fù)機(jī)制進(jìn)行研究與設(shè)計(jì)具有很大的意義。 本文闡述了云計(jì)算的概念、特點(diǎn)以及發(fā)展現(xiàn)狀,并簡單介紹了Hadoop集群的特點(diǎn),在此基礎(chǔ)上,說明了對(duì)大規(guī)模集群故障恢復(fù)機(jī)制進(jìn)行研究的意義以及國內(nèi)外的研究現(xiàn)狀。 然后,本文對(duì)MapReduce計(jì)算模型進(jìn)行了簡單的介紹,闡述了MapReduce計(jì)算模型的基本思想、工作原理和任務(wù)調(diào)度流程。在此基礎(chǔ)上,介紹了MapReduce計(jì)算模型主要故障類型,并針對(duì)各種故障類型深入分析了其故障處理方式。 接著,在現(xiàn)有的MapReduce計(jì)算模型基礎(chǔ)上,增加了節(jié)點(diǎn)的自動(dòng)重啟功能模塊,使得各節(jié)點(diǎn)在故障后可以迅速重啟;并進(jìn)一步對(duì)任務(wù)故障后的恢復(fù)機(jī)制進(jìn)行了設(shè)計(jì)與實(shí)現(xiàn),使得運(yùn)行失敗的任務(wù)在重新調(diào)度后不必從頭開始執(zhí)行,而是可以在故障前的進(jìn)度基礎(chǔ)上繼續(xù)執(zhí)行。通過相關(guān)的優(yōu)化,使得集群在運(yùn)算中出現(xiàn)故障后能夠更快的實(shí)現(xiàn)故障恢復(fù)。 最后,,本文對(duì)優(yōu)化后系統(tǒng)進(jìn)行了功能和性能的測試與評(píng)估。結(jié)果表明,優(yōu)化后系統(tǒng)的故障恢復(fù)機(jī)制在功能上達(dá)到了預(yù)期的目的,性能上優(yōu)于原先的MapReduce計(jì)算模型。
[Abstract]:With the continuous development of large-scale data operation, the scale of computing cluster is becoming larger and larger, and the requirement of system reliability is becoming higher and higher. However, for such a large cluster, there are inevitably a variety of failures. In the process of MapReduce operation, task failure and node fault are very common in cluster. However, there are some defects in the existing fault handling methods of MapReduce. Therefore, it is of great significance to study and design the fault recovery mechanism of MapReduce computing model. This paper describes the concept, characteristics and development status of cloud computing, and briefly introduces the characteristics of Hadoop cluster, and on this basis, explains the significance of research on large-scale cluster fault recovery mechanism and the current research situation at home and abroad. Then, this paper briefly introduces the MapReduce computing model, and expounds the basic idea, working principle and task scheduling flow of the MapReduce computing model. On the basis of this, the main fault types of MapReduce calculation model are introduced, and its fault handling methods are analyzed in depth according to various fault types. Then, on the basis of the existing MapReduce computing model, the automatic restart function module of the node is added, so that each node can be restarted quickly after the failure, and the recovery mechanism after the failure of the task is further designed and implemented. So that the failed task after rescheduling does not have to be executed from scratch, but can continue on the basis of the progress before the failure. Through the correlation optimization, the cluster can realize the fault recovery more quickly after the failure occurs in the operation. Finally, the function and performance of the optimized system are tested and evaluated. The results show that the fault recovery mechanism of the optimized system achieves the expected function and the performance is better than the original MapReduce calculation model.
【學(xué)位授予單位】:華中科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類號(hào)】:TP306
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 開華東;田琪;;基于MapReduce集群的加權(quán)公平隊(duì)列調(diào)度算法研究[J];電腦知識(shí)與技術(shù);2011年09期
2 孫牧;;云端的小飛象—Hadoop[J];程序員;2008年10期
3 周登朋;謝康林;;Lucene搜索引擎[J];計(jì)算機(jī)工程;2007年18期
4 龐毅林,蔣翠玲;進(jìn)程遷移研究[J];計(jì)算機(jī)工程與科學(xué);2001年05期
5 陳全;鄧倩妮;;異構(gòu)環(huán)境下自適應(yīng)的Map-Reduce調(diào)度[J];計(jì)算機(jī)工程與科學(xué);2009年S1期
6 李成華;張新訪;金海;向文;;MapReduce:新型的分布式并行計(jì)算編程模型[J];計(jì)算機(jī)工程與科學(xué);2011年03期
7 張建勛;古志民;鄭超;;云計(jì)算研究進(jìn)展綜述[J];計(jì)算機(jī)應(yīng)用研究;2010年02期
8 李坤;王百杰;;服務(wù)器集群負(fù)載均衡技術(shù)研究及算法比較[J];計(jì)算機(jī)與現(xiàn)代化;2009年08期
9 劉越;;云計(jì)算綜述與移動(dòng)云計(jì)算的應(yīng)用研究[J];信息通信技術(shù);2010年02期
10 王宏宇;;Hadoop平臺(tái)在云計(jì)算中的應(yīng)用[J];軟件;2011年04期
相關(guān)博士學(xué)位論文 前1條
1 陳海波;云計(jì)算平臺(tái)可信性增強(qiáng)技術(shù)的研究[D];復(fù)旦大學(xué);2008年
相關(guān)碩士學(xué)位論文 前5條
1 李燁;云計(jì)算的發(fā)展研究[D];北京郵電大學(xué);2011年
2 魏曉丹;CONDOR系統(tǒng)檢查點(diǎn)機(jī)制的應(yīng)用與開發(fā)[D];吉林大學(xué);2005年
3 趙春燕;云環(huán)境下作業(yè)調(diào)度算法研究與實(shí)現(xiàn)[D];北京交通大學(xué);2009年
4 陳亮;集群負(fù)載均衡關(guān)鍵技術(shù)研究[D];中南大學(xué);2009年
5 柳敬;云計(jì)算平臺(tái)的成本效用研究[D];北京郵電大學(xué);2010年
本文編號(hào):1954926
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1954926.html