云環(huán)境下MapReduce容錯技術(shù)的研究
發(fā)布時間:2018-08-25 10:29
【摘要】:云計(jì)算(Cloud Computing)已經(jīng)成為今天計(jì)算機(jī)行業(yè)中最重要的技術(shù)之一。隨著云技術(shù)的迅速發(fā)展,數(shù)據(jù)的形式從傳統(tǒng)的結(jié)構(gòu)化數(shù)據(jù)(structured data)逐步地向半結(jié)構(gòu)化數(shù)據(jù)(semi-structured data)和非結(jié)構(gòu)化數(shù)據(jù)(unstructureddata)轉(zhuǎn)變,同時數(shù)據(jù)的規(guī)模也有了海量式地膨脹。傳統(tǒng)的數(shù)據(jù)庫技術(shù)已經(jīng)無法應(yīng)對海量規(guī)模的數(shù)據(jù),因此,如何來處理這些大數(shù)據(jù)(Big Data)就成了一個亟待解決的問題。于是,Google在2004年提出了它們的解決方案——MapReduce的技術(shù),來應(yīng)對云時代的大數(shù)據(jù)帶來的挑戰(zhàn)。 簡單地說,MapReduce是一個針對海量數(shù)據(jù)進(jìn)行批量并行化處理的編程模型。它不僅能夠解決處理海量數(shù)據(jù)的性能問題,同時也簡化了程序員開發(fā)分布式并行程序的方式。更重要的是,MapReduce很好地解決了可擴(kuò)展性(Scalability)和可靠性(Reliability)等問題,這也是與傳統(tǒng)數(shù)據(jù)庫相比MapReduce最大的優(yōu)勢。圍繞著MapReduce這個新興的編程框架,國內(nèi)外展開了各種各樣的研究,其中關(guān)于MapReduce的容錯能力一直是研究的熱點(diǎn)之一。國內(nèi)外針對容錯能力的研究方案主要可以歸納為以下兩種方法:再執(zhí)行和備份。這些方案旨在發(fā)現(xiàn)失效后進(jìn)行相應(yīng)的恢復(fù)操作,但是如果不能及時地感知到失效的情況,以上方案就不能充分發(fā)揮作用了。因此,本文將從一個新的角度出發(fā)來研究MapReduce的容錯能力,即如何能夠更快更準(zhǔn)確地感知到MapReduce中的失效節(jié)點(diǎn)。 針對這個問題,本文嘗試提出了兩種思路:自適應(yīng)的超期時間和基于信譽(yù)的探測模型。自適應(yīng)的超期時間旨在改變MapReduce集群中嚴(yán)格并且固定的超期時間。為了做到這一點(diǎn),首先會對每個作業(yè)的執(zhí)行時間進(jìn)行估算,然后讓超期時間自適應(yīng)于估算得到的執(zhí)行時間。在運(yùn)行時,如果JobTracker超過了自適應(yīng)的超期時間內(nèi)沒有收到來自節(jié)點(diǎn)的心跳信息時,那么那個節(jié)點(diǎn)就會被認(rèn)為發(fā)生了失效。而基于信譽(yù)的探測模型則會給每個節(jié)點(diǎn)賦予一個信譽(yù)值,利用reduce任務(wù)遠(yuǎn)程獲取map數(shù)據(jù)失敗的動作,實(shí)時地對節(jié)點(diǎn)的信譽(yù)進(jìn)行評估。如果節(jié)點(diǎn)的信譽(yù)值因?yàn)檫^多的失敗動作而衰減到預(yù)設(shè)的下限值時,那個節(jié)點(diǎn)就被認(rèn)為發(fā)生了失效。 大量實(shí)驗(yàn)數(shù)據(jù)表明,本文提出的兩種方案要明顯優(yōu)于原始的Hadoop集群。當(dāng)集群中有節(jié)點(diǎn)失效之后,相比原始的方案,本文中的方案可以將發(fā)現(xiàn)失效的時間大幅度地縮減。另外,在兩個方案的對比實(shí)驗(yàn)中可以看出,自適應(yīng)的超期時間將更傾向于短作業(yè)的執(zhí)行,而基于信譽(yù)的探測模型更適合大作業(yè)的執(zhí)行。使用這兩種方案,可以更好地配合已有的容錯技術(shù),使得Hadoop集群擁有一個更好的容錯能力——不僅能夠快速地定位失效,并且也能夠快速地從失效中恢復(fù)回來。本文的主要貢獻(xiàn)是提出了自適應(yīng)的超期時間和基于信譽(yù)的探測模型兩種機(jī)制,同時擴(kuò)寬了Hadoop容錯的研究思路。
[Abstract]:Cloud computing (Cloud Computing) has become one of the most important technologies in the computer industry today. With the rapid development of cloud technology, the form of data has gradually changed from traditional structured data (structured data) to semi-structured data (semi-structured data) and unstructured data (unstructureddata). At the same time, the scale of data has expanded in a large scale. Traditional database technology has been unable to cope with massive data, so how to deal with these big data (Big Data) has become a problem to be solved. So in 2004 Google put forward its solution, MapReduce, to meet the challenges posed by big data in the cloud age. Simply put, MapReduce is a programming model for batch parallelization of mass data. It not only solves the performance problem of processing massive data, but also simplifies the way for programmers to develop distributed parallel programs. More importantly, MapReduce solves the problems of extensibility (Scalability) and reliability (Reliability), which is the biggest advantage of MapReduce compared with traditional database. A variety of researches have been carried out around MapReduce as a new programming framework, among which the fault-tolerant ability of MapReduce has been one of the hotspots. The domestic and foreign research programs for fault tolerance can be summed up into the following two methods: reexecution and backup. The purpose of these schemes is to carry out the corresponding recovery operations after the failure is discovered, but if the failure situation is not perceived in time, the above schemes will not be able to play a full role. Therefore, this paper will study the fault-tolerant ability of MapReduce from a new point of view, that is, how to perceive the failure nodes in MapReduce more quickly and accurately. In order to solve this problem, this paper tries to put forward two kinds of ideas: adaptive overdue time and credit-based detection model. Adaptive overruns are designed to change the rigid and fixed outages in MapReduce clusters. In order to do this, the execution time of each job is estimated first, and then the overdue time is adaptive to the estimated execution time. At run time, if the JobTracker does not receive heartbeat information from a node within an adaptive timeframe, that node is considered invalid. The credit-based detection model assigns a credit value to each node and makes use of the reduce task to remotely obtain the action of map data failure and evaluate the reputation of the node in real time. The node is considered to be invalid if the creditworthiness value of the node attenuates to the preset lower limit due to too many failed actions. A large number of experimental data show that the two schemes proposed in this paper are obviously superior to the original Hadoop cluster. When there are node failures in the cluster, compared with the original scheme, the time of finding the failure can be greatly reduced by the scheme in this paper. In addition, it can be seen from the comparative experiments of the two schemes that the adaptive extended time will be more inclined to the execution of short jobs, while the credit-based detection model is more suitable for the execution of large jobs. By using these two schemes, the existing fault-tolerant techniques can be better coordinated, and the Hadoop cluster has a better fault-tolerant capability not only to locate failures quickly, but also to recover quickly from failures. The main contribution of this paper is to propose two kinds of mechanisms: adaptive delay time and credit-based detection model, and at the same time broaden the research ideas of Hadoop fault tolerance.
【學(xué)位授予單位】:上海交通大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2012
【分類號】:TP302.8
[Abstract]:Cloud computing (Cloud Computing) has become one of the most important technologies in the computer industry today. With the rapid development of cloud technology, the form of data has gradually changed from traditional structured data (structured data) to semi-structured data (semi-structured data) and unstructured data (unstructureddata). At the same time, the scale of data has expanded in a large scale. Traditional database technology has been unable to cope with massive data, so how to deal with these big data (Big Data) has become a problem to be solved. So in 2004 Google put forward its solution, MapReduce, to meet the challenges posed by big data in the cloud age. Simply put, MapReduce is a programming model for batch parallelization of mass data. It not only solves the performance problem of processing massive data, but also simplifies the way for programmers to develop distributed parallel programs. More importantly, MapReduce solves the problems of extensibility (Scalability) and reliability (Reliability), which is the biggest advantage of MapReduce compared with traditional database. A variety of researches have been carried out around MapReduce as a new programming framework, among which the fault-tolerant ability of MapReduce has been one of the hotspots. The domestic and foreign research programs for fault tolerance can be summed up into the following two methods: reexecution and backup. The purpose of these schemes is to carry out the corresponding recovery operations after the failure is discovered, but if the failure situation is not perceived in time, the above schemes will not be able to play a full role. Therefore, this paper will study the fault-tolerant ability of MapReduce from a new point of view, that is, how to perceive the failure nodes in MapReduce more quickly and accurately. In order to solve this problem, this paper tries to put forward two kinds of ideas: adaptive overdue time and credit-based detection model. Adaptive overruns are designed to change the rigid and fixed outages in MapReduce clusters. In order to do this, the execution time of each job is estimated first, and then the overdue time is adaptive to the estimated execution time. At run time, if the JobTracker does not receive heartbeat information from a node within an adaptive timeframe, that node is considered invalid. The credit-based detection model assigns a credit value to each node and makes use of the reduce task to remotely obtain the action of map data failure and evaluate the reputation of the node in real time. The node is considered to be invalid if the creditworthiness value of the node attenuates to the preset lower limit due to too many failed actions. A large number of experimental data show that the two schemes proposed in this paper are obviously superior to the original Hadoop cluster. When there are node failures in the cluster, compared with the original scheme, the time of finding the failure can be greatly reduced by the scheme in this paper. In addition, it can be seen from the comparative experiments of the two schemes that the adaptive extended time will be more inclined to the execution of short jobs, while the credit-based detection model is more suitable for the execution of large jobs. By using these two schemes, the existing fault-tolerant techniques can be better coordinated, and the Hadoop cluster has a better fault-tolerant capability not only to locate failures quickly, but also to recover quickly from failures. The main contribution of this paper is to propose two kinds of mechanisms: adaptive delay time and credit-based detection model, and at the same time broaden the research ideas of Hadoop fault tolerance.
【學(xué)位授予單位】:上海交通大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2012
【分類號】:TP302.8
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 王義強(qiáng);袁修華;馬明陽;胡艷娟;;基于神經(jīng)網(wǎng)絡(luò)的數(shù)控插補(bǔ)容錯技術(shù)[J];農(nóng)業(yè)機(jī)械學(xué)報(bào);2011年07期
2 歐陽城添;王曦;鄭劍;;自適應(yīng)一致表決算法[J];計(jì)算機(jī)科學(xué);2011年07期
3 柳燕煌;黃立勤;;云計(jì)算環(huán)境的并行支持向量機(jī)[J];南陽理工學(xué)院學(xué)報(bào);2011年02期
4 鄭啟龍;汪睿;王向前;;HPMR內(nèi)存管理模塊優(yōu)化設(shè)計(jì)[J];計(jì)算機(jī)系統(tǒng)應(yīng)用;2011年08期
5 寧新建;;航空火控計(jì)算機(jī)的容錯技術(shù)分析[J];計(jì)算機(jī)與網(wǎng)絡(luò);2010年17期
6 李虎;鄒鵬;賈焰;周斌;;一種基于MapReduce的分布式文本數(shù)據(jù)過濾模型研究[J];信息網(wǎng)絡(luò)安全;2011年09期
7 李遠(yuǎn)方;鄧世昆;聞玉彪;韓月陽;;Hadoop-MapReduce下的PageRank矩陣分塊算法[J];計(jì)算機(jī)技術(shù)與發(fā)展;2011年08期
8 李s,
本文編號:2202608
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2202608.html
最近更新
教材專著