基于Hadoop的MapReduce性能優(yōu)化研究
本文選題:MapReduce + 負(fù)載均衡; 參考:《南京郵電大學(xué)》2017年碩士論文
【摘要】:隨著互聯(lián)網(wǎng)技術(shù)的不斷發(fā)展,網(wǎng)絡(luò)和企業(yè)生產(chǎn)中需要處理的數(shù)據(jù)越來越多,云計(jì)算成為大數(shù)據(jù)處理的流行計(jì)算模式。Hadoop作為云計(jì)算的開源系統(tǒng)平臺,很快成為大數(shù)據(jù)處理的主流技術(shù)。隨著Hadoop集群的廣泛應(yīng)用,其性能問題也成為人們關(guān)注的重點(diǎn)。其中負(fù)載均衡對集群性能有重要的影響,也是本文研究的重點(diǎn)。本文通過對MapReduce運(yùn)行過程中存在的負(fù)載均衡問題進(jìn)行研究和分析,達(dá)到集群性能優(yōu)化的目的。針對異構(gòu)環(huán)境下,節(jié)點(diǎn)計(jì)算能力各不相同,在MapReduce的任務(wù)調(diào)度過程中容易出現(xiàn)節(jié)點(diǎn)任務(wù)負(fù)載不均,導(dǎo)致個(gè)別節(jié)點(diǎn)執(zhí)行時(shí)間過長,進(jìn)而影響整個(gè)作業(yè)的響應(yīng)時(shí)間問題,本文提出了一種基于負(fù)載均衡的任務(wù)調(diào)度算法。該算法通過分析任務(wù)執(zhí)行特點(diǎn)以及異構(gòu)集群中節(jié)點(diǎn)性能,得到了一個(gè)任務(wù)調(diào)度負(fù)載均衡度量值,該度量值為節(jié)點(diǎn)的任務(wù)分配提供了依據(jù),使得每個(gè)節(jié)點(diǎn)在任務(wù)調(diào)度中得到與其性能相匹配的計(jì)算負(fù)載,并在任務(wù)執(zhí)行過程中通過建立節(jié)點(diǎn)通信模型實(shí)現(xiàn)負(fù)載的動態(tài)調(diào)節(jié),從而保證了任務(wù)調(diào)度中的負(fù)載均衡。對于MapReduce執(zhí)行過程中采用默認(rèn)Hash分區(qū)機(jī)制導(dǎo)致在處理密集型數(shù)據(jù)時(shí),節(jié)點(diǎn)接收到的數(shù)據(jù)負(fù)載傾斜問題,本文提出了分區(qū)代價(jià)模型,該模型對分區(qū)的負(fù)載均衡問題進(jìn)行代價(jià)評估,并在此模型基礎(chǔ)上提出了新的細(xì)粒度分區(qū)算法,該算法通過增加分區(qū)個(gè)數(shù),減少分區(qū)中的傾斜數(shù)據(jù),并通過分區(qū)代價(jià)模型保證節(jié)點(diǎn)接收到的數(shù)據(jù)量的相對均衡。最后,通過搭建實(shí)驗(yàn)環(huán)境,并設(shè)計(jì)相應(yīng)的實(shí)驗(yàn)方案,驗(yàn)證了本文提出的任務(wù)調(diào)度算法和細(xì)粒度分區(qū)算法對集群負(fù)載均衡的優(yōu)化。
[Abstract]:With the continuous development of Internet technology, more and more data need to be processed in network and enterprise production. Cloud computing has become the popular computing mode of big data processing. Hadoop is the open source system platform of cloud computing. Soon became the mainstream of big data processing technology. With the wide application of Hadoop cluster, its performance has become the focus of attention. Load balancing has an important impact on cluster performance and is also the focus of this paper. In this paper, the problem of load balancing in the running process of MapReduce is studied and analyzed to optimize the performance of cluster. In the heterogeneous environment, the computing power of the nodes is different. In the task scheduling process of MapReduce, the workload of the nodes is uneven, which leads to the excessive execution time of individual nodes, and then affects the response time of the whole job. In this paper, a task scheduling algorithm based on load balancing is proposed. By analyzing the characteristics of task execution and the performance of nodes in heterogeneous clusters, the algorithm obtains a task scheduling load balancing measure, which provides a basis for the task allocation of nodes. Each node gets a computational load matching its performance in task scheduling and dynamically adjusts the load by establishing a node communication model in the process of task execution so as to ensure the load balance in task scheduling. As the default Hash partitioning mechanism used in the execution of MapReduce results in the skew of data received by nodes when processing intensive data, this paper proposes a partition cost model, which evaluates the cost of load balancing in partitions. Based on this model, a new fine-grained partitioning algorithm is proposed. By increasing the number of partitions, the skew data in the partition is reduced, and the relative equilibrium of the data received by the nodes is ensured by the partition cost model. Finally, the task scheduling algorithm and fine-grained partitioning algorithm are proposed to optimize the load balance of the cluster by setting up the experimental environment and designing the corresponding experimental scheme.
【學(xué)位授予單位】:南京郵電大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP311.13
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 張松;杜慶偉;孫靜;孫振;;Hadoop異構(gòu)集群中數(shù)據(jù)負(fù)載均衡的研究[J];計(jì)算機(jī)應(yīng)用與軟件;2016年05期
2 宋杰;王智;李甜甜;于戈;;一種優(yōu)化MapReduce系統(tǒng)能耗的數(shù)據(jù)布局算法[J];軟件學(xué)報(bào);2015年08期
3 李航晨;秦小麟;沈堯;;基于壓力反饋的MapReduce負(fù)載均衡策略[J];計(jì)算機(jī)科學(xué);2015年04期
4 黃偉建;周鳴愛;;MapReduce高可用性的研究與優(yōu)化[J];計(jì)算機(jī)工程與設(shè)計(jì);2014年11期
5 宋杰;劉雪冰;朱志良;李甜甜;趙大哲;于戈;;一種能效優(yōu)化的MapReduce資源比模型[J];計(jì)算機(jī)學(xué)報(bào);2015年01期
6 鄭曉薇;項(xiàng)明;張大為;劉青昆;;基于節(jié)點(diǎn)能力的Hadoop集群任務(wù)自適應(yīng)調(diào)度方法[J];計(jì)算機(jī)研究與發(fā)展;2014年03期
7 韓蕾;孫徐湛;吳志川;陳立軍;;MapReduce上基于抽樣的數(shù)據(jù)劃分最優(yōu)化研究[J];計(jì)算機(jī)研究與發(fā)展;2013年S2期
8 董新華;李瑞軒;周灣灣;王聰;薛正元;廖東杰;;Hadoop系統(tǒng)性能優(yōu)化與功能增強(qiáng)綜述[J];計(jì)算機(jī)研究與發(fā)展;2013年S2期
9 謝然;;Hadoop 從小象到巨人的崛起[J];互聯(lián)網(wǎng)周刊;2013年20期
10 周家?guī)?王琦;高軍;;一種基于動態(tài)劃分的MapReduce負(fù)載均衡方法[J];計(jì)算機(jī)研究與發(fā)展;2013年S1期
相關(guān)博士學(xué)位論文 前1條
1 顧濤;集群MapReduce環(huán)境中任務(wù)和作業(yè)調(diào)度若干關(guān)鍵問題的研究[D];南開大學(xué);2014年
相關(guān)碩士學(xué)位論文 前2條
1 熊晟;Hadoop集群性能優(yōu)化研究[D];杭州電子科技大學(xué);2015年
2 耿玉嬌;MapReduce中基于抽樣技術(shù)的傾斜問題研究[D];大連海事大學(xué);2013年
,本文編號:1920475
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1920475.html