天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

云環(huán)境下MapReduce容錯(cuò)技術(shù)的研究

發(fā)布時(shí)間:2018-08-25 10:29
【摘要】:云計(jì)算(Cloud Computing)已經(jīng)成為今天計(jì)算機(jī)行業(yè)中最重要的技術(shù)之一。隨著云技術(shù)的迅速發(fā)展,數(shù)據(jù)的形式從傳統(tǒng)的結(jié)構(gòu)化數(shù)據(jù)(structured data)逐步地向半結(jié)構(gòu)化數(shù)據(jù)(semi-structured data)和非結(jié)構(gòu)化數(shù)據(jù)(unstructureddata)轉(zhuǎn)變,同時(shí)數(shù)據(jù)的規(guī)模也有了海量式地膨脹。傳統(tǒng)的數(shù)據(jù)庫(kù)技術(shù)已經(jīng)無(wú)法應(yīng)對(duì)海量規(guī)模的數(shù)據(jù),因此,如何來(lái)處理這些大數(shù)據(jù)(Big Data)就成了一個(gè)亟待解決的問(wèn)題。于是,Google在2004年提出了它們的解決方案——MapReduce的技術(shù),來(lái)應(yīng)對(duì)云時(shí)代的大數(shù)據(jù)帶來(lái)的挑戰(zhàn)。 簡(jiǎn)單地說(shuō),MapReduce是一個(gè)針對(duì)海量數(shù)據(jù)進(jìn)行批量并行化處理的編程模型。它不僅能夠解決處理海量數(shù)據(jù)的性能問(wèn)題,同時(shí)也簡(jiǎn)化了程序員開發(fā)分布式并行程序的方式。更重要的是,MapReduce很好地解決了可擴(kuò)展性(Scalability)和可靠性(Reliability)等問(wèn)題,這也是與傳統(tǒng)數(shù)據(jù)庫(kù)相比MapReduce最大的優(yōu)勢(shì)。圍繞著MapReduce這個(gè)新興的編程框架,國(guó)內(nèi)外展開了各種各樣的研究,其中關(guān)于MapReduce的容錯(cuò)能力一直是研究的熱點(diǎn)之一。國(guó)內(nèi)外針對(duì)容錯(cuò)能力的研究方案主要可以歸納為以下兩種方法:再執(zhí)行和備份。這些方案旨在發(fā)現(xiàn)失效后進(jìn)行相應(yīng)的恢復(fù)操作,但是如果不能及時(shí)地感知到失效的情況,以上方案就不能充分發(fā)揮作用了。因此,本文將從一個(gè)新的角度出發(fā)來(lái)研究MapReduce的容錯(cuò)能力,即如何能夠更快更準(zhǔn)確地感知到MapReduce中的失效節(jié)點(diǎn)。 針對(duì)這個(gè)問(wèn)題,本文嘗試提出了兩種思路:自適應(yīng)的超期時(shí)間和基于信譽(yù)的探測(cè)模型。自適應(yīng)的超期時(shí)間旨在改變MapReduce集群中嚴(yán)格并且固定的超期時(shí)間。為了做到這一點(diǎn),首先會(huì)對(duì)每個(gè)作業(yè)的執(zhí)行時(shí)間進(jìn)行估算,然后讓超期時(shí)間自適應(yīng)于估算得到的執(zhí)行時(shí)間。在運(yùn)行時(shí),如果JobTracker超過(guò)了自適應(yīng)的超期時(shí)間內(nèi)沒有收到來(lái)自節(jié)點(diǎn)的心跳信息時(shí),那么那個(gè)節(jié)點(diǎn)就會(huì)被認(rèn)為發(fā)生了失效。而基于信譽(yù)的探測(cè)模型則會(huì)給每個(gè)節(jié)點(diǎn)賦予一個(gè)信譽(yù)值,利用reduce任務(wù)遠(yuǎn)程獲取map數(shù)據(jù)失敗的動(dòng)作,實(shí)時(shí)地對(duì)節(jié)點(diǎn)的信譽(yù)進(jìn)行評(píng)估。如果節(jié)點(diǎn)的信譽(yù)值因?yàn)檫^(guò)多的失敗動(dòng)作而衰減到預(yù)設(shè)的下限值時(shí),那個(gè)節(jié)點(diǎn)就被認(rèn)為發(fā)生了失效。 大量實(shí)驗(yàn)數(shù)據(jù)表明,本文提出的兩種方案要明顯優(yōu)于原始的Hadoop集群。當(dāng)集群中有節(jié)點(diǎn)失效之后,相比原始的方案,本文中的方案可以將發(fā)現(xiàn)失效的時(shí)間大幅度地縮減。另外,在兩個(gè)方案的對(duì)比實(shí)驗(yàn)中可以看出,自適應(yīng)的超期時(shí)間將更傾向于短作業(yè)的執(zhí)行,而基于信譽(yù)的探測(cè)模型更適合大作業(yè)的執(zhí)行。使用這兩種方案,可以更好地配合已有的容錯(cuò)技術(shù),使得Hadoop集群擁有一個(gè)更好的容錯(cuò)能力——不僅能夠快速地定位失效,并且也能夠快速地從失效中恢復(fù)回來(lái)。本文的主要貢獻(xiàn)是提出了自適應(yīng)的超期時(shí)間和基于信譽(yù)的探測(cè)模型兩種機(jī)制,同時(shí)擴(kuò)寬了Hadoop容錯(cuò)的研究思路。
[Abstract]:Cloud computing (Cloud Computing) has become one of the most important technologies in the computer industry today. With the rapid development of cloud technology, the form of data has gradually changed from traditional structured data (structured data) to semi-structured data (semi-structured data) and unstructured data (unstructureddata). At the same time, the scale of data has expanded in a large scale. Traditional database technology has been unable to cope with massive data, so how to deal with these big data (Big Data) has become a problem to be solved. So in 2004 Google put forward its solution, MapReduce, to meet the challenges posed by big data in the cloud age. Simply put, MapReduce is a programming model for batch parallelization of mass data. It not only solves the performance problem of processing massive data, but also simplifies the way for programmers to develop distributed parallel programs. More importantly, MapReduce solves the problems of extensibility (Scalability) and reliability (Reliability), which is the biggest advantage of MapReduce compared with traditional database. A variety of researches have been carried out around MapReduce as a new programming framework, among which the fault-tolerant ability of MapReduce has been one of the hotspots. The domestic and foreign research programs for fault tolerance can be summed up into the following two methods: reexecution and backup. The purpose of these schemes is to carry out the corresponding recovery operations after the failure is discovered, but if the failure situation is not perceived in time, the above schemes will not be able to play a full role. Therefore, this paper will study the fault-tolerant ability of MapReduce from a new point of view, that is, how to perceive the failure nodes in MapReduce more quickly and accurately. In order to solve this problem, this paper tries to put forward two kinds of ideas: adaptive overdue time and credit-based detection model. Adaptive overruns are designed to change the rigid and fixed outages in MapReduce clusters. In order to do this, the execution time of each job is estimated first, and then the overdue time is adaptive to the estimated execution time. At run time, if the JobTracker does not receive heartbeat information from a node within an adaptive timeframe, that node is considered invalid. The credit-based detection model assigns a credit value to each node and makes use of the reduce task to remotely obtain the action of map data failure and evaluate the reputation of the node in real time. The node is considered to be invalid if the creditworthiness value of the node attenuates to the preset lower limit due to too many failed actions. A large number of experimental data show that the two schemes proposed in this paper are obviously superior to the original Hadoop cluster. When there are node failures in the cluster, compared with the original scheme, the time of finding the failure can be greatly reduced by the scheme in this paper. In addition, it can be seen from the comparative experiments of the two schemes that the adaptive extended time will be more inclined to the execution of short jobs, while the credit-based detection model is more suitable for the execution of large jobs. By using these two schemes, the existing fault-tolerant techniques can be better coordinated, and the Hadoop cluster has a better fault-tolerant capability not only to locate failures quickly, but also to recover quickly from failures. The main contribution of this paper is to propose two kinds of mechanisms: adaptive delay time and credit-based detection model, and at the same time broaden the research ideas of Hadoop fault tolerance.
【學(xué)位授予單位】:上海交通大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類號(hào)】:TP302.8

【相似文獻(xiàn)】

相關(guān)期刊論文 前10條

1 王義強(qiáng);袁修華;馬明陽(yáng);胡艷娟;;基于神經(jīng)網(wǎng)絡(luò)的數(shù)控插補(bǔ)容錯(cuò)技術(shù)[J];農(nóng)業(yè)機(jī)械學(xué)報(bào);2011年07期

2 歐陽(yáng)城添;王曦;鄭劍;;自適應(yīng)一致表決算法[J];計(jì)算機(jī)科學(xué);2011年07期

3 柳燕煌;黃立勤;;云計(jì)算環(huán)境的并行支持向量機(jī)[J];南陽(yáng)理工學(xué)院學(xué)報(bào);2011年02期

4 鄭啟龍;汪睿;王向前;;HPMR內(nèi)存管理模塊優(yōu)化設(shè)計(jì)[J];計(jì)算機(jī)系統(tǒng)應(yīng)用;2011年08期

5 寧新建;;航空火控計(jì)算機(jī)的容錯(cuò)技術(shù)分析[J];計(jì)算機(jī)與網(wǎng)絡(luò);2010年17期

6 李虎;鄒鵬;賈焰;周斌;;一種基于MapReduce的分布式文本數(shù)據(jù)過(guò)濾模型研究[J];信息網(wǎng)絡(luò)安全;2011年09期

7 李遠(yuǎn)方;鄧世昆;聞?dòng)癖?韓月陽(yáng);;Hadoop-MapReduce下的PageRank矩陣分塊算法[J];計(jì)算機(jī)技術(shù)與發(fā)展;2011年08期

8 李s,

本文編號(hào):2202608


資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2202608.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶fb4bb***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com
欧美成人国产精品高清| 国产成人av在线免播放观看av| 久久精品久久精品中文字幕| 国产免费人成视频尤物| 精品久久久一区二区三| 久草热视频这里只有精品| 亚洲成人久久精品国产| 久久中文字幕中文字幕中文| 日本精品啪啪一区二区三区| 老司机精品视频免费入口| 日本和亚洲的香蕉视频| 一级片二级片欧美日韩| 深夜视频在线观看免费你懂| 日韩欧美亚洲综合在线| 老司机精品在线你懂的| 99久久精品免费看国产高清| 污污黄黄的成年亚洲毛片| 国产午夜精品美女露脸视频| 国产精品一区二区成人在线| 国产中文字幕一二三区| 欧美大黄片在线免费观看| 亚洲精品国产第一区二区多人| 亚洲一区二区精品国产av| 日本欧美一区二区三区就| 欧美精品一区二区水蜜桃| 国产又黄又爽又粗视频在线| 人人妻人人澡人人夜夜| 久久精品久久精品中文字幕| 99久久精品午夜一区| 久久国产亚洲精品成人| 人妻中文一区二区三区| 国产成人精品99在线观看| 午夜精品福利视频观看| 日韩精品亚洲精品国产精品| 日本午夜免费啪视频在线| 国产一区二区三区av在线| 东京热一二三区在线免| 东北老熟妇全程露脸被内射| 日韩高清一区二区三区四区| 国产综合欧美日韩在线精品| 亚洲精品欧美精品日韩精品|