異構(gòu)MapReduce集群的網(wǎng)絡(luò)與調(diào)度優(yōu)化
發(fā)布時(shí)間:2018-03-30 17:30
本文選題:MapReduce 切入點(diǎn):OpenFlow 出處:《上海交通大學(xué)》2014年碩士論文
【摘要】:因?yàn)镸apReduce對(duì)于處理大規(guī)模數(shù)據(jù)有著很好的可擴(kuò)展性,所以MapRe-duce成為了云計(jì)算中非常流行的一個(gè)編程模型。但是,MapReduce在異構(gòu)集群上的表現(xiàn)并不好。出現(xiàn)這種情況的原因是Hadoop的MapReduce的負(fù)載均衡機(jī)制——備份任務(wù)會(huì)造成過(guò)量的網(wǎng)絡(luò)流量,與Shufe爭(zhēng)奪帶寬。本課題基于OpenFlow協(xié)議提出了一個(gè)稱為OFScheduler+的動(dòng)態(tài)異構(gòu)MapReduce集群優(yōu)化方案,可以減少帶寬爭(zhēng)奪情況。優(yōu)化方案主要致力于減少帶寬競(jìng)爭(zhēng),,增加鏈路負(fù)載的平衡性和帶寬利用率,同時(shí)對(duì)于MapReduce任務(wù)調(diào)度算法的任務(wù)分配算法進(jìn)行了改進(jìn),使得任務(wù)分配的時(shí)代考慮了網(wǎng)絡(luò)的因素。OFScheduler+包括下面的4個(gè)部分: (1)一個(gè)可以標(biāo)記不同流量類型的標(biāo)記機(jī)制,利用對(duì)IP頭部的ToS的值進(jìn)行修改的方法標(biāo)記了不同類型的流量 (2)一個(gè)針對(duì)MapReduce基層網(wǎng)絡(luò)特征進(jìn)行特殊優(yōu)化的動(dòng)態(tài)流調(diào)度算法,可以提高集群的網(wǎng)絡(luò)利用率 (3)一個(gè)流速控制機(jī)制,可以根據(jù)集群中當(dāng)前的網(wǎng)絡(luò)狀態(tài),事實(shí)上開啟或者關(guān)閉MapReduce的負(fù)載平衡機(jī)制 (4) JobTracker通過(guò)查詢OpenFlow的控制器得到當(dāng)前網(wǎng)絡(luò)的狀態(tài),并將網(wǎng)絡(luò)因素融入了MapReduce調(diào)度算法的任務(wù)分配方案中 為了對(duì)本課題提出的優(yōu)化方案的效果進(jìn)行評(píng)估,我們實(shí)現(xiàn)了一個(gè)MapRe-duce模擬器,以及一個(gè)真實(shí)的OpenFlow的testbed。模擬結(jié)果說(shuō)明,在一個(gè)多路徑拓?fù)涞漠悩?gòu)集群中,OFScheduler+可以提高鏈路的帶寬利用率,對(duì)于大多數(shù)MapReduce作業(yè),可以提高26-64%的性能,尤其是對(duì)于數(shù)據(jù)密集型的作業(yè)有更好的效果。在testbed上的實(shí)驗(yàn)結(jié)果說(shuō)明,OFScheduler+可以部署于真實(shí)環(huán)境,并取得良好的效果。
[Abstract]:Because MapReduce is extensible for dealing with large scale data, So MapRe-duce has become a very popular programming model in cloud computing. But MapReduce doesn't perform well on heterogeneous clusters. The reason for this is that Hadoop's MapReduce load balancing mechanism, the backup task, can cause excessive network traffic. This paper presents a dynamic heterogeneous MapReduce cluster optimization scheme called OFScheduler based on OpenFlow protocol, which can reduce bandwidth contention. The optimization scheme is mainly devoted to reducing bandwidth competition. Increase the balance of link load and bandwidth utilization, and improve the task allocation algorithm of the MapReduce task scheduling algorithm, so that the era of task allocation takes into account the network factors.!!! Scheduler includes the following four parts:. A tagging mechanism that can mark different traffic types, using the method of modifying the ToS value of the IP header, to mark different types of traffic. A special optimized dynamic flow scheduling algorithm based on the characteristics of MapReduce grass-roots network can improve the network utilization of cluster. A flow rate control mechanism that can in fact turn on or off the load balancing mechanism of MapReduce based on the current network state in the cluster. JobTracker gets the status of the current network by querying the controller of the OpenFlow, and integrates the network factors into the task allocation scheme of the MapReduce scheduling algorithm. In order to evaluate the effectiveness of the optimization scheme proposed in this paper, we have implemented a MapRe-duce simulator and a real OpenFlow testbed. the simulation results show that, In a heterogeneous cluster with a multipath topology, the OF Scheduler can improve the bandwidth utilization of the link, and for most MapReduce jobs, it can improve the performance by 26-64%. The experimental results on testbed show that the OF Scheduler can be deployed in real environment and achieve good results.
【學(xué)位授予單位】:上海交通大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類號(hào)】:TP311.13;TP393.01
【參考文獻(xiàn)】
相關(guān)期刊論文 前1條
1 梁建武;周楊;;一種異構(gòu)環(huán)境下的Hadoop調(diào)度算法[J];中國(guó)科技論文;2012年07期
本文編號(hào):1686902
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/1686902.html
最近更新
教材專著