MapReduce模型的數(shù)據(jù)分配策略研究
發(fā)布時(shí)間:2018-05-22 13:28
本文選題:云計(jì)算 + Hadoop; 參考:《華中科技大學(xué)》2013年碩士論文
【摘要】:自2007年云計(jì)算誕生至今,它已經(jīng)逐漸成為國(guó)內(nèi)外IT界熱門(mén)的概念,得到了廣泛的關(guān)注。在當(dāng)今互聯(lián)網(wǎng)高速發(fā)達(dá)的環(huán)境中,面對(duì)數(shù)據(jù)量的急劇增長(zhǎng),如何快速有效的對(duì)海量數(shù)據(jù)進(jìn)行存儲(chǔ)和計(jì)算成為亟待解決的問(wèn)題,這也是云計(jì)算誕生的原動(dòng)力。但是對(duì)于云計(jì)算而言,它本身只是一種思維方式,雖然有硬件設(shè)施提供必要的環(huán)境,但是能夠支撐云計(jì)算思想的編程模型更加重要。由Google提出的MapReduce并行編程模型,為云計(jì)算海量數(shù)據(jù)的處理提供了軟件支持。 Hadoop以一種可靠、高效、可伸縮的方式工作,在短短幾年里成為了主流的開(kāi)源云計(jì)算平臺(tái),,但是Hadoop仍然是一個(gè)比較年輕的平臺(tái),在很多地方有不夠完善之處,對(duì)其進(jìn)行改進(jìn)是十分必要的。通過(guò)對(duì)Hadoop平臺(tái)下的MapReduce并行編程模型進(jìn)行深入研究,主要針對(duì)MapReduce并行編程模型在Map端輸出的中間數(shù)據(jù)分布不均衡現(xiàn)象提出解決方案,該方案的設(shè)計(jì)思路是用兩個(gè)階段MapReduce作業(yè)對(duì)上述問(wèn)題進(jìn)行處理,第一個(gè)MapReduce階段用于對(duì)源數(shù)據(jù)集進(jìn)行并行抽樣,根據(jù)抽樣的結(jié)果估計(jì)數(shù)據(jù)信息,提出一種稱(chēng)為L(zhǎng)AB的分配策略,該分配策略對(duì)中間數(shù)據(jù)進(jìn)行均衡分配;第二MapReduce階段按照上述數(shù)據(jù)分配策略執(zhí)行MapReduce作業(yè)。 通過(guò)實(shí)驗(yàn)表明,該方案減少了作業(yè)運(yùn)行時(shí)間,Reduce端輸入數(shù)據(jù)達(dá)到負(fù)載均衡,從而證明改進(jìn)方案的可行性和其優(yōu)勢(shì)所在。該方案能夠充分利用計(jì)算資源,避免資源的浪費(fèi),提高了程序運(yùn)行效率。
[Abstract]:Since the birth of cloud computing in 2007, it has gradually become a hot concept in IT field at home and abroad. With the rapid development of the Internet, how to store and compute the massive data quickly and effectively becomes an urgent problem in the face of the rapid growth of data, which is also the driving force of cloud computing. But for cloud computing, it is only a way of thinking. Although there are hardware facilities to provide the necessary environment, the programming model that can support cloud computing is more important. The MapReduce parallel programming model proposed by Google provides software support for cloud computing massive data processing. Hadoop, which works in a reliable, efficient and scalable way, has become the mainstream open source cloud computing platform in just a few years, but Hadoop is still a relatively young platform that is imperfect in many places. It is necessary to improve it. Through the in-depth study of MapReduce parallel programming model based on Hadoop platform, a solution is proposed to solve the problem of uneven distribution of intermediate data output from MapReduce parallel programming model in Map terminal. The design idea of the scheme is to deal with the above problems with two stage MapReduce jobs. The first stage of MapReduce is used to sample the source data set in parallel. According to the result of sampling, the data information is estimated, and an allocation strategy called LAB is proposed. The allocation strategy distributes the intermediate data evenly, and the second MapReduce stage executes the MapReduce job according to the above data allocation strategy. The experimental results show that this scheme can reduce the operation time and reduce the input data to achieve load balance, which proves the feasibility of the improved scheme and its advantages. The program can make full use of computing resources, avoid the waste of resources, and improve the efficiency of program operation.
【學(xué)位授予單位】:華中科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類(lèi)號(hào)】:TP333;TP311.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前3條
1 陳康;鄭緯民;;云計(jì)算:系統(tǒng)實(shí)例與研究現(xiàn)狀[J];軟件學(xué)報(bào);2009年05期
2 李玉林;董晶;;基于Hadoop的MapReduce模型的研究與改進(jìn)[J];計(jì)算機(jī)工程與設(shè)計(jì);2012年08期
3 孫廣中;肖鋒;熊曦;;MapReduce模型的調(diào)度及容錯(cuò)機(jī)制研究[J];微電子學(xué)與計(jì)算機(jī);2007年09期
本文編號(hào):1922251
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1922251.html
最近更新
教材專(zhuān)著