基于中間結(jié)果檢查點(diǎn)的MapReduce容錯(cuò)方法研究與實(shí)現(xiàn)
發(fā)布時(shí)間:2018-04-28 11:48
本文選題:檢查點(diǎn)容錯(cuò) + 中間結(jié)果; 參考:《內(nèi)蒙古大學(xué)》2017年碩士論文
【摘要】:隨著互聯(lián)網(wǎng)的高速發(fā)展,網(wǎng)絡(luò)所產(chǎn)生的數(shù)據(jù)量開(kāi)始呈爆發(fā)性的增長(zhǎng),傳統(tǒng)的存儲(chǔ)和計(jì)算模式已經(jīng)不能滿足應(yīng)用中的存儲(chǔ)和計(jì)算需求,云計(jì)算憑借其優(yōu)秀的分布式處理技術(shù)成為目前最流行的數(shù)據(jù)處理技術(shù)。其中,MapReduce作為一種高效的并行計(jì)算框架,越來(lái)越多的應(yīng)用在大數(shù)據(jù)處理領(lǐng)域。目前MapReduce模型有兩種常見(jiàn)的故障類型:任務(wù)故障和節(jié)點(diǎn)故障。對(duì)于任務(wù)故障,MapReduce采用"再執(zhí)行"的處理方式,即當(dāng)任務(wù)執(zhí)行失敗以后,會(huì)被重新分配執(zhí)行,任務(wù)的每次重新執(zhí)行不僅浪費(fèi)了大量的計(jì)算資源,也延長(zhǎng)了任務(wù)平均完成時(shí)間,降低了計(jì)算效率。節(jié)點(diǎn)故障一般分為Master節(jié)點(diǎn)故障和Worker節(jié)點(diǎn)故障,對(duì)于Master節(jié)點(diǎn)故障,MapReduce常采用雙工的容錯(cuò)方法。對(duì)于Worker節(jié)點(diǎn)故障,由于Map任務(wù)產(chǎn)生的中間結(jié)果存儲(chǔ)在Worker節(jié)點(diǎn)上,故障會(huì)導(dǎo)致中間結(jié)果的丟失,已經(jīng)完成的任務(wù)需要被重新分配執(zhí)行。針對(duì)這種故障類型,MapReduce計(jì)算模型目前還沒(méi)有合適、高效的容錯(cuò)方法。本文針對(duì)當(dāng)前MapReduce計(jì)算模型中容錯(cuò)機(jī)制不足所導(dǎo)致的容錯(cuò)效率低、計(jì)算資源浪費(fèi)等問(wèn)題,通過(guò)檢查點(diǎn)容錯(cuò)技術(shù),對(duì)任務(wù)執(zhí)行狀態(tài)和中間結(jié)果以檢查點(diǎn)文件的方式進(jìn)行保存,保證中間結(jié)果不丟失,當(dāng)故障發(fā)生以后根據(jù)檢查點(diǎn)文件進(jìn)行作業(yè)恢復(fù)時(shí),提高作業(yè)的恢復(fù)執(zhí)行效率。本文主要完成以下三方面的工作。(1)分析Hadoop源碼中MapReduce容錯(cuò)機(jī)制的不足:通過(guò)分析Hadoop源碼,研究MapReduce執(zhí)行過(guò)程中如何處理任務(wù)故障和節(jié)點(diǎn)故障及其弊端,為改進(jìn)MapReduce目前的容錯(cuò)方式提供分析基礎(chǔ)。(2)設(shè)計(jì)與實(shí)現(xiàn)檢查點(diǎn)容錯(cuò)機(jī)制:針對(duì)MapReduce計(jì)算過(guò)程中常見(jiàn)的任務(wù)故障和節(jié)點(diǎn)故障,本文設(shè)計(jì)和實(shí)現(xiàn)了檢查點(diǎn)容錯(cuò)機(jī)制,將任務(wù)的執(zhí)行狀態(tài)和中間結(jié)果的元數(shù)據(jù)信息以檢查點(diǎn)文件的形式進(jìn)行保存,當(dāng)任務(wù)被重新分配執(zhí)行時(shí)利用相應(yīng)的檢查點(diǎn)文件進(jìn)行任務(wù)的快速恢復(fù)執(zhí)行。其中,針對(duì)任務(wù)故障設(shè)計(jì)實(shí)現(xiàn)了本地檢查點(diǎn)容錯(cuò)機(jī)制,針對(duì)節(jié)點(diǎn)故障設(shè)計(jì)實(shí)現(xiàn)了遠(yuǎn)程和查詢?cè)獢?shù)據(jù)檢查點(diǎn)容錯(cuò)機(jī)制。(3)檢查點(diǎn)容錯(cuò)機(jī)制的測(cè)試運(yùn)行:在設(shè)計(jì)和實(shí)現(xiàn)了檢查點(diǎn)容錯(cuò)機(jī)制后,通過(guò)搭建Hadoop集群,編寫(xiě)應(yīng)用程序,對(duì)應(yīng)用程序進(jìn)行故障注入,驗(yàn)證當(dāng)故障發(fā)生時(shí)檢查點(diǎn)容錯(cuò)機(jī)制能否提供有效的容錯(cuò)功能,并通過(guò)實(shí)驗(yàn)測(cè)試檢查點(diǎn)容錯(cuò)機(jī)制的容錯(cuò)效率。
[Abstract]:With the rapid development of the Internet, the amount of data generated by the network began to increase explosively. The traditional storage and computing mode can no longer meet the storage and computing needs in applications. Cloud computing has become the most popular data processing technology with its excellent distributed processing technology. As an efficient parallel computing framework, MapReduce is more and more used in big data processing field. At present, MapReduce model has two common fault types: task fault and node fault. In the case of task failure, MapReduce uses a "re-execution" approach, that is, when the task fails, it is reassigned to execute. Each reexecution of the task not only wastes a lot of computing resources, but also prolongs the average task completion time. The calculation efficiency is reduced. Node faults are generally divided into Master node faults and Worker node failures. For Master node faults MapReduce often adopts duplex fault-tolerant method. For Worker node failure, because the intermediate results generated by the Map task are stored on the Worker node, the failure will lead to the loss of the intermediate results, so the completed tasks need to be reassigned and executed. At present, there is no suitable and efficient fault-tolerant method for this kind of fault type. Aiming at the problems of low fault-tolerant efficiency and waste of computing resources caused by the deficiency of fault-tolerant mechanism in the current MapReduce computing model, this paper uses checkpoint fault-tolerant technology to save the task execution state and intermediate results in the way of checkpoint files. Ensure that the intermediate result is not lost and improve the efficiency of job recovery when the fault occurs and the job is restored according to the checkpoint file. This paper mainly completes the following three aspects of work. 1) analyzing the shortcomings of MapReduce fault-tolerant mechanism in Hadoop source code: by analyzing the Hadoop source code, how to deal with the task fault, node fault and its malpractice in the MapReduce execution process is studied. In order to improve the current fault-tolerant mode of MapReduce, we design and implement the fault-tolerant mechanism of checkpoint. Aiming at the common task faults and node faults in the process of MapReduce calculation, this paper designs and implements the fault-tolerant mechanism of checkpoint. The metadata information of the execution state and intermediate result of the task is saved in the form of checkpoint file. When the task is reallocated and executed, the corresponding checkpoint file is used to quickly resume the execution of the task. Among them, the fault tolerant mechanism of local checkpoint is designed and implemented for the task fault. The fault-tolerant mechanism of remote and query metadata checkpointing is designed and implemented in this paper. After designing and implementing the fault-tolerant mechanism of checkpoint, the application program is written by setting up a Hadoop cluster. Fault injection is carried out to verify whether the fault tolerance mechanism can provide effective fault tolerance when the fault occurs, and the fault tolerance efficiency of the checkpoint fault-tolerant mechanism is tested by experiments.
【學(xué)位授予單位】:內(nèi)蒙古大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP302.8;TP311.13
【參考文獻(xiàn)】
相關(guān)碩士學(xué)位論文 前6條
1 王浩;基于自適應(yīng)策略的MapReduce檢查點(diǎn)技術(shù)的研究與優(yōu)化[D];上海交通大學(xué);2015年
2 陳洪江;MapReduce下容錯(cuò)機(jī)制的研究與優(yōu)化[D];哈爾濱工業(yè)大學(xué);2014年
3 趙志龍;Hadoop容錯(cuò)能力測(cè)試平臺(tái)的設(shè)計(jì)與實(shí)現(xiàn)[D];哈爾濱工業(yè)大學(xué);2013年
4 朱浩;云環(huán)境下MapReduce容錯(cuò)技術(shù)的研究[D];上海交通大學(xué);2012年
5 郭銳;MapReduce故障恢復(fù)機(jī)制設(shè)計(jì)與實(shí)現(xiàn)[D];華中科技大學(xué);2012年
6 施巖;云計(jì)算研究及Hadoop應(yīng)用程序的開(kāi)發(fā)與測(cè)試[D];北京郵電大學(xué);2011年
,本文編號(hào):1815158
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1815158.html
最近更新
教材專著