MapReduce模型的性能優(yōu)化研究

發(fā)布時(shí)間：2018-05-06 22:43

本文選題：MapReduce + 動(dòng)態(tài)調(diào)度��；參考：《鄭州大學(xué)》2017年碩士論文

【摘要】：隨著互聯(lián)網(wǎng)、云計(jì)算以及物聯(lián)網(wǎng)的快速發(fā)展,電子商務(wù)、電子政務(wù)、社交網(wǎng)絡(luò)等新應(yīng)用為人們的日常生活和工作帶來極大方便,同時(shí)也使數(shù)據(jù)產(chǎn)生的方式越來越多樣化,數(shù)據(jù)量呈爆炸式增長(zhǎng)。在大數(shù)據(jù)的時(shí)代,MapReduce以其高效率、易擴(kuò)展、簡(jiǎn)易性等一系列特性,成為現(xiàn)階段海量數(shù)據(jù)處理的主流模型。但是,MapReduce現(xiàn)有的數(shù)據(jù)分配機(jī)制易導(dǎo)致輸入數(shù)據(jù)傾斜的問題,造成少數(shù)的幾個(gè)點(diǎn)上分配了大部分的數(shù)據(jù),最終導(dǎo)致各節(jié)點(diǎn)的負(fù)載不同;在現(xiàn)實(shí)生活中需要被處理的海量數(shù)據(jù)大部分都是呈偏態(tài)分布得,即Zipf分布,這樣就會(huì)導(dǎo)致一些數(shù)據(jù)對(duì)應(yīng)的記錄數(shù)不均等。同時(shí),易導(dǎo)致分區(qū)一樣的數(shù)據(jù)匯聚到性能低的節(jié)點(diǎn)上,造成各節(jié)點(diǎn)作業(yè)執(zhí)行時(shí)間不同的現(xiàn)象。對(duì)于密集型數(shù)據(jù)任務(wù),在拉取數(shù)據(jù)時(shí)會(huì)造成大量的磁盤訪問以及競(jìng)爭(zhēng)有限的網(wǎng)絡(luò)寬帶資源等瓶頸。MapReduce性能優(yōu)化的關(guān)鍵問題之一是數(shù)據(jù)傾斜。為了優(yōu)化MapReduce數(shù)據(jù)傾斜的問題,在本文中提出了MapReduce在線抽樣分區(qū)的負(fù)載均衡優(yōu)化機(jī)制。該機(jī)制在任務(wù)開始之前,首先對(duì)源數(shù)據(jù)進(jìn)行抽樣分析操作,來預(yù)測(cè)源數(shù)據(jù)分布的特征;根據(jù)數(shù)據(jù)分布特征,動(dòng)態(tài)地調(diào)用不同的數(shù)據(jù)分區(qū)優(yōu)化策略;在任務(wù)執(zhí)行過程中,實(shí)時(shí)監(jiān)測(cè)每個(gè)節(jié)點(diǎn)的負(fù)載,同時(shí)動(dòng)態(tài)優(yōu)化對(duì)應(yīng)的數(shù)據(jù)分區(qū)策略。為提升異構(gòu)環(huán)境下的MapReduce性能,本文提出一種異構(gòu)環(huán)境下基于節(jié)點(diǎn)作業(yè)時(shí)間感知的動(dòng)態(tài)MapReduce調(diào)度策略:DTHE(Dynamic MapReduce scheduling based on the Time-aware of node jobs in Heterogeneous Environments)。DTHE在作業(yè)執(zhí)行前,首先標(biāo)記部分任務(wù)作為節(jié)點(diǎn)樣本任務(wù)并優(yōu)先處理,在執(zhí)行其他任務(wù)時(shí)分析樣本任務(wù),預(yù)測(cè)節(jié)點(diǎn)性能和數(shù)據(jù)分布特征,動(dòng)態(tài)采取相應(yīng)的調(diào)度策略;在作業(yè)運(yùn)行中實(shí)時(shí)監(jiān)測(cè)節(jié)點(diǎn)任務(wù)狀態(tài),提前拉取節(jié)點(diǎn)下一個(gè)任務(wù)數(shù)據(jù)到本地內(nèi)存。實(shí)驗(yàn)結(jié)果表明:在異構(gòu)環(huán)境下,DTEH能夠縮短5.1%的作業(yè)執(zhí)行時(shí)間并減少磁盤I/O,有效提升MapReduce性能。
[Abstract]:With the rapid development of the Internet, cloud computing and the Internet of things, new applications such as e-commerce, e-government, social networking bring great convenience to people's daily life and work, and make the way of data generation more and more diverse. The amount of data increased explosively. In big data's time, MapReduce has become the mainstream model of mass data processing because of its high efficiency, expansibility and simplicity. However, the existing data distribution mechanism of MapReduce can easily lead to the problem of input data skew, resulting in the distribution of most of the data on a few points, resulting in different load of each node. In real life, most of the massive data that need to be processed are skewed distribution, that is, Zipf distribution, which will lead to some data corresponding to the number of records is not equal. At the same time, it is easy to converge the same data into the low performance nodes, resulting in different job execution time of each node. For the intensive data tasks, data skew is one of the key problems in the performance optimization of MapReduce, which will cause a lot of disk access and limited network broadband resources. In order to optimize the skew of MapReduce data, a load balancing optimization mechanism for MapReduce online sampling partition is proposed in this paper. Before the task starts, the mechanism first carries on the sampling analysis to the source data to predict the source data distribution characteristic, according to the data distribution characteristic, dynamically invokes different data partition optimization strategy, in the task execution process, The load of each node is monitored in real time and the corresponding data partition strategy is dynamically optimized. In order to improve the performance of MapReduce in heterogeneous environment, this paper proposes a dynamic MapReduce scheduling strategy based on Node Job time Awareness: MapReduce dynamic MapReduce scheduling based on the Time-aware of node jobs in Heterogeneous Environments).DTHE before job execution. First, some tasks are labeled as node sample tasks and processed first, then the sample tasks are analyzed when other tasks are executed, and the node performance and data distribution characteristics are predicted, and corresponding scheduling strategies are adopted dynamically. The task state of the node is monitored in real time during the operation, and the next task data is pulled to the local memory in advance. The experimental results show that DTEH can shorten the job execution time by 5.1% and reduce the disk I / O in heterogeneous environment, which can effectively improve the performance of MapReduce.
【學(xué)位授予單位】：鄭州大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2017
【分類號(hào)】：TP311.13

【相似文獻(xiàn)】

相關(guān)期刊論文前10條

1 ;產(chǎn)品性能優(yōu)化技術(shù)的新進(jìn)展[J];CAD/CAM與制造業(yè)信息化;2003年09期

2 丁燕云;魏娟;;淺析SQL數(shù)據(jù)庫的性能優(yōu)化問題[J];科技信息(學(xué)術(shù)研究);2007年34期

3 ;簡(jiǎn)單易用網(wǎng)絡(luò)性能優(yōu)化軟件[J];網(wǎng)絡(luò)與信息;1999年10期

4 袁山龍,吳潔明;證券網(wǎng)上集中交易系統(tǒng)性能優(yōu)化的研究與應(yīng)用[J];微計(jì)算機(jī)應(yīng)用;2003年05期

5 張建華;王群華;;對(duì)系統(tǒng)性能優(yōu)化的十點(diǎn)辨析[J];計(jì)算機(jī)系統(tǒng)應(yīng)用;2007年05期

6 王勇;;基于SQL數(shù)據(jù)庫的性能優(yōu)化問題分析[J];電腦知識(shí)與技術(shù);2008年15期

7 王保平;;性能優(yōu)化的簡(jiǎn)單法則[J];程序員;2009年09期

8 李培慧;何宗鍵;;某人力資源管理系統(tǒng)中用戶導(dǎo)入模塊性能優(yōu)化方案分析[J];科技信息;2010年35期

9 曉慧;;本本性能優(yōu)化圣手[J];電腦知識(shí)與技術(shù)(經(jīng)驗(yàn)技巧);2012年01期

10 王江偉;陳琛;;淺析軟件性能優(yōu)化[J];科技風(fēng);2012年08期

相關(guān)會(huì)議論文前10條

1 姚杰;;寶鋼不銹鋼系統(tǒng)數(shù)據(jù)庫性能優(yōu)化方案[A];中國(guó)計(jì)量協(xié)會(huì)冶金分會(huì)2007年會(huì)論文集[C];2007年

2 代桂平;殷保群;奚宏生;周亞平;;受控M／G／1排隊(duì)系統(tǒng)的性能優(yōu)化[A];第二十二屆中國(guó)控制會(huì)議論文集（下）[C];2003年

3 李彥;王屹;徐繼明;;ERP系統(tǒng)的性能優(yōu)化[A];全國(guó)煉鋼連鑄過程自動(dòng)化技術(shù)交流會(huì)論文集[C];2006年

4 趙海波;楊昭;方箏;徐振軍;;燃?xì)鈮嚎s式熱泵系統(tǒng)全年季節(jié)性能優(yōu)化[A];中國(guó)制冷學(xué)會(huì)2007學(xué)術(shù)年會(huì)論文集[C];2007年

5 高明星;;DB2數(shù)據(jù)庫應(yīng)用性能優(yōu)化問題淺談[A];科技、工程與經(jīng)濟(jì)社會(huì)協(xié)調(diào)發(fā)展——中國(guó)科協(xié)第五屆青年學(xué)術(shù)年會(huì)論文集[C];2004年

6 奚宏生;唐昊;殷保群;周亞平;;Markov控制過程在緊致行動(dòng)集上的性能優(yōu)化[A];第二十一屆中國(guó)控制會(huì)議論文集[C];2002年

7 高明星;;DB2數(shù)據(jù)庫應(yīng)用性能優(yōu)化問題淺談[A];鐵道部信息技術(shù)中心成立30周年暨鐵路運(yùn)輸管理信息系統(tǒng)（TMIS）工程全面竣工投產(chǎn)TMIS工程建設(shè)論文專輯（二）[C];2005年

8 高明星;;DB2數(shù)據(jù)庫應(yīng)用性能優(yōu)化問題淺談[A];中國(guó)鐵道學(xué)會(huì)——2004年度學(xué)術(shù)活動(dòng)優(yōu)秀論文評(píng)獎(jiǎng)?wù)撐募痆C];2005年

9 杜勁松;李強(qiáng);包勁松;;國(guó)產(chǎn)600MW機(jī)組循環(huán)效率試驗(yàn)及性能優(yōu)化分析[A];2008中國(guó)可持續(xù)發(fā)展論壇論文集（3）[C];2008年

10 杜勁松;李強(qiáng);包勁松;;國(guó)產(chǎn)600MW機(jī)組循環(huán)效率試驗(yàn)及性能優(yōu)化分析[A];全國(guó)火電大機(jī)組（600MW級(jí)）競(jìng)賽第十二屆年會(huì)論文集（上冊(cè)）[C];2008年

相關(guān)重要報(bào)紙文章前5條

1 陳翔;性能優(yōu)化只能救火[N];中國(guó)計(jì)算機(jī)報(bào);2007年

2 本報(bào)記者郭平;EMC簡(jiǎn)單高效實(shí)現(xiàn)私有云[N];計(jì)算機(jī)世界;2010年

3 ;安圖特引入新型數(shù)據(jù)加速解決方案[N];人民郵電;2008年

4 陳洪康郭寶群李雪梅;淺談VLDB性能優(yōu)化與維護(hù)[N];人民郵電;2001年

5 首席記者崔凌云;動(dòng)態(tài)調(diào)度當(dāng)前經(jīng)濟(jì)運(yùn)行工作建立落實(shí)省委省政府部署臺(tái)賬[N];蘭州日?qǐng)?bào);2014年

相關(guān)博士學(xué)位論文前10條

1 李攀攀;云服務(wù)SLA合規(guī)性驗(yàn)證及性能優(yōu)化研究[D];哈爾濱工業(yè)大學(xué);2016年

2 張明;龍芯平臺(tái)上高性能計(jì)算的性能優(yōu)化關(guān)鍵問題研究[D];中國(guó)科學(xué)技術(shù)大學(xué);2017年

3 陳偉鋒;大規(guī)模復(fù)雜過程系統(tǒng)的高性能優(yōu)化理論與方法研究[D];浙江大學(xué);2011年

4 李磊;分布式系統(tǒng)中容錯(cuò)機(jī)制性能優(yōu)化技術(shù)研究[D];國(guó)防科學(xué)技術(shù)大學(xué);2007年

5 賈海鵬;面向GPU計(jì)算平臺(tái)的若干并行優(yōu)化關(guān)鍵技術(shù)研究[D];中國(guó)海洋大學(xué);2012年

6 那俊;基于兩階段適應(yīng)的ASBS性能持續(xù)優(yōu)化方法研究[D];東北大學(xué);2011年

7 魏丫丫;Web傳輸?shù)男阅軆?yōu)化[D];清華大學(xué);2006年

8 何倩;P2P系統(tǒng)性能優(yōu)化若干關(guān)鍵技術(shù)研究[D];北京郵電大學(xué);2010年

9 毛宏燕;基于部分計(jì)值的服務(wù)性能優(yōu)化研究[D];上海交通大學(xué);2006年

10 楊富社;大城市常規(guī)公交動(dòng)態(tài)調(diào)度理論與方法研究[D];長(zhǎng)安大學(xué);2015年

相關(guān)碩士學(xué)位論文前10條

1 丁雷道;MapReduce模型的性能優(yōu)化研究[D];鄭州大學(xué);2017年

2 鄒興偉;防偽纖維熒光檢測(cè)儀性能優(yōu)化研究[D];西南科技大學(xué);2015年

3 邱能俊;科學(xué)大數(shù)據(jù)云分析服務(wù)的性能優(yōu)化技術(shù)研究[D];貴州大學(xué);2015年

4 陳俊t，

本文編號(hào)：1854258

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/shoufeilunwen/xixikjs/1854258.html

上一篇：基于認(rèn)知發(fā)展理論的兒童早教類App設(shè)計(jì)開發(fā)研究
下一篇：基于MVVM模式的體系結(jié)構(gòu)一致性測(cè)試

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

MapReduce模型的性能優(yōu)化研究