基于Hadoop的MapReduce性能優(yōu)化研究

發(fā)布時間：2018-05-21 19:36

本文選題：MapReduce + 負載均衡　；參考：《南京郵電大學》2017年碩士論文

【摘要】：隨著互聯(lián)網(wǎng)技術(shù)的不斷發(fā)展,網(wǎng)絡(luò)和企業(yè)生產(chǎn)中需要處理的數(shù)據(jù)越來越多,云計算成為大數(shù)據(jù)處理的流行計算模式。Hadoop作為云計算的開源系統(tǒng)平臺,很快成為大數(shù)據(jù)處理的主流技術(shù)。隨著Hadoop集群的廣泛應用,其性能問題也成為人們關(guān)注的重點。其中負載均衡對集群性能有重要的影響,也是本文研究的重點。本文通過對MapReduce運行過程中存在的負載均衡問題進行研究和分析,達到集群性能優(yōu)化的目的。針對異構(gòu)環(huán)境下,節(jié)點計算能力各不相同,在MapReduce的任務(wù)調(diào)度過程中容易出現(xiàn)節(jié)點任務(wù)負載不均,導致個別節(jié)點執(zhí)行時間過長,進而影響整個作業(yè)的響應時間問題,本文提出了一種基于負載均衡的任務(wù)調(diào)度算法。該算法通過分析任務(wù)執(zhí)行特點以及異構(gòu)集群中節(jié)點性能,得到了一個任務(wù)調(diào)度負載均衡度量值,該度量值為節(jié)點的任務(wù)分配提供了依據(jù),使得每個節(jié)點在任務(wù)調(diào)度中得到與其性能相匹配的計算負載,并在任務(wù)執(zhí)行過程中通過建立節(jié)點通信模型實現(xiàn)負載的動態(tài)調(diào)節(jié),從而保證了任務(wù)調(diào)度中的負載均衡。對于MapReduce執(zhí)行過程中采用默認Hash分區(qū)機制導致在處理密集型數(shù)據(jù)時,節(jié)點接收到的數(shù)據(jù)負載傾斜問題,本文提出了分區(qū)代價模型,該模型對分區(qū)的負載均衡問題進行代價評估,并在此模型基礎(chǔ)上提出了新的細粒度分區(qū)算法,該算法通過增加分區(qū)個數(shù),減少分區(qū)中的傾斜數(shù)據(jù),并通過分區(qū)代價模型保證節(jié)點接收到的數(shù)據(jù)量的相對均衡。最后,通過搭建實驗環(huán)境,并設(shè)計相應的實驗方案,驗證了本文提出的任務(wù)調(diào)度算法和細粒度分區(qū)算法對集群負載均衡的優(yōu)化。
[Abstract]:With the continuous development of Internet technology, more and more data need to be processed in network and enterprise production. Cloud computing has become the popular computing mode of big data processing. Hadoop is the open source system platform of cloud computing. Soon became the mainstream of big data processing technology. With the wide application of Hadoop cluster, its performance has become the focus of attention. Load balancing has an important impact on cluster performance and is also the focus of this paper. In this paper, the problem of load balancing in the running process of MapReduce is studied and analyzed to optimize the performance of cluster. In the heterogeneous environment, the computing power of the nodes is different. In the task scheduling process of MapReduce, the workload of the nodes is uneven, which leads to the excessive execution time of individual nodes, and then affects the response time of the whole job. In this paper, a task scheduling algorithm based on load balancing is proposed. By analyzing the characteristics of task execution and the performance of nodes in heterogeneous clusters, the algorithm obtains a task scheduling load balancing measure, which provides a basis for the task allocation of nodes. Each node gets a computational load matching its performance in task scheduling and dynamically adjusts the load by establishing a node communication model in the process of task execution so as to ensure the load balance in task scheduling. As the default Hash partitioning mechanism used in the execution of MapReduce results in the skew of data received by nodes when processing intensive data, this paper proposes a partition cost model, which evaluates the cost of load balancing in partitions. Based on this model, a new fine-grained partitioning algorithm is proposed. By increasing the number of partitions, the skew data in the partition is reduced, and the relative equilibrium of the data received by the nodes is ensured by the partition cost model. Finally, the task scheduling algorithm and fine-grained partitioning algorithm are proposed to optimize the load balance of the cluster by setting up the experimental environment and designing the corresponding experimental scheme.
【學位授予單位】：南京郵電大學
【學位級別】：碩士
【學位授予年份】：2017
【分類號】：TP311.13

【參考文獻】

相關(guān)期刊論文前10條

1 張松;杜慶偉;孫靜;孫振;;Hadoop異構(gòu)集群中數(shù)據(jù)負載均衡的研究[J];計算機應用與軟件;2016年05期

2 宋杰;王智;李甜甜;于戈;;一種優(yōu)化MapReduce系統(tǒng)能耗的數(shù)據(jù)布局算法[J];軟件學報;2015年08期

3 李航晨;秦小麟;沈堯;;基于壓力反饋的MapReduce負載均衡策略[J];計算機科學;2015年04期

4 黃偉建;周鳴愛;;MapReduce高可用性的研究與優(yōu)化[J];計算機工程與設(shè)計;2014年11期

5 宋杰;劉雪冰;朱志良;李甜甜;趙大哲;于戈;;一種能效優(yōu)化的MapReduce資源比模型[J];計算機學報;2015年01期

6 鄭曉薇;項明;張大為;劉青昆;;基于節(jié)點能力的Hadoop集群任務(wù)自適應調(diào)度方法[J];計算機研究與發(fā)展;2014年03期

7 韓蕾;孫徐湛;吳志川;陳立軍;;MapReduce上基于抽樣的數(shù)據(jù)劃分最優(yōu)化研究[J];計算機研究與發(fā)展;2013年S2期

8 董新華;李瑞軒;周灣灣;王聰;薛正元;廖東杰;;Hadoop系統(tǒng)性能優(yōu)化與功能增強綜述[J];計算機研究與發(fā)展;2013年S2期

9 謝然;;Hadoop 從小象到巨人的崛起[J];互聯(lián)網(wǎng)周刊;2013年20期

10 周家?guī)?王琦;高軍;;一種基于動態(tài)劃分的MapReduce負載均衡方法[J];計算機研究與發(fā)展;2013年S1期

相關(guān)博士學位論文前1條

1 顧濤;集群MapReduce環(huán)境中任務(wù)和作業(yè)調(diào)度若干關(guān)鍵問題的研究[D];南開大學;2014年

相關(guān)碩士學位論文前2條

1 熊晟;Hadoop集群性能優(yōu)化研究[D];杭州電子科技大學;2015年

2 耿玉嬌;MapReduce中基于抽樣技術(shù)的傾斜問題研究[D];大連海事大學;2013年

，

本文編號：1920475

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1920475.html

上一篇：基于3D人體骨架的動作識別
下一篇：基于最大邊界準則的稀疏局部嵌入特征提取方法

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于Hadoop的MapReduce性能優(yōu)化研究