基于能量感知的Hadoop平臺(tái)調(diào)度器研究
發(fā)布時(shí)間:2019-02-13 03:21
【摘要】:現(xiàn)如今各行各業(yè)的數(shù)據(jù)每天都在快速增加,學(xué)術(shù)界和企業(yè)發(fā)現(xiàn)這些數(shù)據(jù)中隱藏著巨大的價(jià)值。在這種需求下各種數(shù)據(jù)分析框架和平臺(tái)發(fā)展起來,其中Hadoop是目前最流行的開源平臺(tái),該平臺(tái)實(shí)現(xiàn)了Google提出的MAPREDUCE計(jì)算模型和GFS存儲(chǔ)模型。近年來不斷積累的溫室氣體正在改變?nèi)驓夂颍瑪?shù)據(jù)中心的建設(shè)也應(yīng)該把低碳減排放在重要的位置;同時(shí)企業(yè)在數(shù)據(jù)中心電能方面的投入也越來越多。目前Hadoop集群中的主機(jī)數(shù)目正在不斷的增加,數(shù)據(jù)中心能耗控制問題也越來越突出。因此從Hadoop平臺(tái)方面研究如何減少Hadoop集群的能量消耗對(duì)于環(huán)境保護(hù)和減少企業(yè)成本具有重要的意義。 結(jié)合Hadoop平臺(tái)的工作原理以及MapReduce計(jì)算框架運(yùn)行時(shí)環(huán)境的架構(gòu),本文確定了從資源與任務(wù)調(diào)度的角度在Hadoop平臺(tái)中構(gòu)建一套能量消耗控制的體系結(jié)構(gòu)。使用先進(jìn)先出算法的單隊(duì)列調(diào)度器(FIFO Scheduler)和基于計(jì)算能力算法的調(diào)度器(Capacity Scheduler)是平臺(tái)自帶的兩種常用調(diào)度器,通過對(duì)它們的測試和分析,總結(jié)出了這兩種調(diào)度器對(duì)于構(gòu)建Hadoop平臺(tái)能量控制框架的缺陷和不足。基于原有調(diào)度器的不足本文設(shè)計(jì)并實(shí)現(xiàn)了基于能量感知的Hadoop平臺(tái)調(diào)度器,該調(diào)度器中構(gòu)建了一套能量控制的框架,并設(shè)計(jì)了兩層調(diào)度策略來進(jìn)行作業(yè)到資源的節(jié)能調(diào)度。 本文設(shè)計(jì)的基于能量感知的Hadoop平臺(tái)調(diào)度器具有以下兩個(gè)特點(diǎn):1)調(diào)度器可以調(diào)節(jié)和平衡Hadoop集群作業(yè)運(yùn)行過程中的Qos和總能耗;2)調(diào)度器本身具有高效的調(diào)度策略。調(diào)度器的整體框架是基于多隊(duì)列設(shè)計(jì)的,設(shè)計(jì)了兩層調(diào)度策略來完成作業(yè)的任務(wù)到計(jì)算資源之間的動(dòng)態(tài)節(jié)能匹配,兩層調(diào)度策略具有高效性,并且時(shí)間復(fù)雜度是線性的;多隊(duì)列中作業(yè)的分配使用了類似一致性hash的方法,,保證了作業(yè)到隊(duì)列的高效動(dòng)態(tài)分配以及系統(tǒng)的高并發(fā)性。 本文最后使用XCP(xen cloud platform)云平臺(tái)構(gòu)建了具有32臺(tái)虛擬機(jī)的Hadoop集群環(huán)境。并在該集群環(huán)境中將本文設(shè)計(jì)的節(jié)能調(diào)度器與Hadoop平臺(tái)自帶的先進(jìn)先出調(diào)度器和計(jì)算能力調(diào)度器進(jìn)行了對(duì)比實(shí)驗(yàn),實(shí)驗(yàn)對(duì)比的目標(biāo)是在不同的作業(yè)輸入情況下Hadoop集群使用不同的調(diào)度器時(shí),作業(yè)運(yùn)行總能量消耗和時(shí)間消耗兩方面的性能;另一方面是對(duì)比本文設(shè)計(jì)的節(jié)能調(diào)度器自身在控制作業(yè)運(yùn)行能耗和時(shí)間消耗的能力。實(shí)驗(yàn)結(jié)果表明本文設(shè)計(jì)的節(jié)能調(diào)度器具有較好的能量控制能力,同時(shí)不增加集群作業(yè)運(yùn)行的時(shí)間消耗;本文設(shè)計(jì)的節(jié)能調(diào)度器在作業(yè)運(yùn)行時(shí)間和能耗兩方面也具有較好的調(diào)節(jié)能力。
[Abstract]:Today, data from various industries are increasing rapidly every day, and academics and businesses find great value hidden in them. Under this requirement, a variety of data analysis frameworks and platforms have been developed, among which Hadoop is the most popular open source platform. The platform implements the MAPREDUCE computing model and GFS storage model proposed by Google. The accumulation of greenhouse gases in recent years is changing the global climate, the construction of data centers should also put low carbon emissions reduction in the important position, and enterprises in the data center electricity investment is also increasing. At present, the number of hosts in Hadoop cluster is increasing, and the problem of data center energy consumption control is becoming more and more prominent. Therefore, it is of great significance to study how to reduce the energy consumption of Hadoop cluster from the aspect of Hadoop platform for environmental protection and enterprise cost reduction. Combined with the working principle of Hadoop platform and the framework of runtime environment of MapReduce computing framework, this paper establishes a set of energy consumption control architecture in Hadoop platform from the point of view of resource and task scheduling. Single queue scheduler (FIFO Scheduler) using first-in first-out algorithm and (Capacity Scheduler) scheduler based on computing power algorithm are two kinds of common schedulers that come with the platform. The defects and shortcomings of these two schedulers for building the energy control framework of Hadoop platform are summarized. Based on the shortcomings of the original scheduler, this paper designs and implements an energy-aware Hadoop platform scheduler. In this scheduler, a set of energy control framework is constructed, and a two-layer scheduling strategy is designed to carry out the energy saving scheduling from the job to the resource. The energy aware Hadoop platform scheduler designed in this paper has the following two characteristics: 1) the scheduler can adjust and balance the Qos and total energy consumption in the running of Hadoop cluster jobs; 2) the scheduler itself has an efficient scheduling strategy. The overall framework of the scheduler is based on the multi-queue design. A two-layer scheduling strategy is designed to complete the task of the job to the dynamic energy saving matching between the computing resources. The two-layer scheduling strategy is efficient and the time complexity is linear. The method of similar consistency hash is used in the assignment of jobs in multiple queues, which ensures the efficient dynamic assignment of jobs to queues and the high concurrency of the system. In the end, the Hadoop cluster environment with 32 virtual machines is constructed by using XCP (xen cloud platform) cloud platform. In this cluster environment, the energy saving scheduler designed in this paper is compared with the first-in-first-out scheduler and the computing power scheduler that comes with Hadoop platform. The objective of the experiment is to compare the performance of the total energy consumption and time consumption of the job when the Hadoop cluster uses different schedulers under different job input conditions. On the other hand, it compares the energy consumption and time consumption of the energy saving scheduler designed in this paper. The experimental results show that the energy-saving scheduling device designed in this paper has better energy control ability and does not increase the time consumption of cluster operation. The energy-saving scheduler designed in this paper also has better regulating ability in terms of job running time and energy consumption.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類號(hào)】:TP393.09
本文編號(hào):2421118
[Abstract]:Today, data from various industries are increasing rapidly every day, and academics and businesses find great value hidden in them. Under this requirement, a variety of data analysis frameworks and platforms have been developed, among which Hadoop is the most popular open source platform. The platform implements the MAPREDUCE computing model and GFS storage model proposed by Google. The accumulation of greenhouse gases in recent years is changing the global climate, the construction of data centers should also put low carbon emissions reduction in the important position, and enterprises in the data center electricity investment is also increasing. At present, the number of hosts in Hadoop cluster is increasing, and the problem of data center energy consumption control is becoming more and more prominent. Therefore, it is of great significance to study how to reduce the energy consumption of Hadoop cluster from the aspect of Hadoop platform for environmental protection and enterprise cost reduction. Combined with the working principle of Hadoop platform and the framework of runtime environment of MapReduce computing framework, this paper establishes a set of energy consumption control architecture in Hadoop platform from the point of view of resource and task scheduling. Single queue scheduler (FIFO Scheduler) using first-in first-out algorithm and (Capacity Scheduler) scheduler based on computing power algorithm are two kinds of common schedulers that come with the platform. The defects and shortcomings of these two schedulers for building the energy control framework of Hadoop platform are summarized. Based on the shortcomings of the original scheduler, this paper designs and implements an energy-aware Hadoop platform scheduler. In this scheduler, a set of energy control framework is constructed, and a two-layer scheduling strategy is designed to carry out the energy saving scheduling from the job to the resource. The energy aware Hadoop platform scheduler designed in this paper has the following two characteristics: 1) the scheduler can adjust and balance the Qos and total energy consumption in the running of Hadoop cluster jobs; 2) the scheduler itself has an efficient scheduling strategy. The overall framework of the scheduler is based on the multi-queue design. A two-layer scheduling strategy is designed to complete the task of the job to the dynamic energy saving matching between the computing resources. The two-layer scheduling strategy is efficient and the time complexity is linear. The method of similar consistency hash is used in the assignment of jobs in multiple queues, which ensures the efficient dynamic assignment of jobs to queues and the high concurrency of the system. In the end, the Hadoop cluster environment with 32 virtual machines is constructed by using XCP (xen cloud platform) cloud platform. In this cluster environment, the energy saving scheduler designed in this paper is compared with the first-in-first-out scheduler and the computing power scheduler that comes with Hadoop platform. The objective of the experiment is to compare the performance of the total energy consumption and time consumption of the job when the Hadoop cluster uses different schedulers under different job input conditions. On the other hand, it compares the energy consumption and time consumption of the energy saving scheduler designed in this paper. The experimental results show that the energy-saving scheduling device designed in this paper has better energy control ability and does not increase the time consumption of cluster operation. The energy-saving scheduler designed in this paper also has better regulating ability in terms of job running time and energy consumption.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類號(hào)】:TP393.09
【參考文獻(xiàn)】
相關(guān)期刊論文 前2條
1 翟健宏;李偉;葛瑞海;楊茹;;基于聚類與貝葉斯分類器的網(wǎng)絡(luò)節(jié)點(diǎn)分組算法及評(píng)價(jià)模型[J];電信科學(xué);2013年02期
2 宋杰;李甜甜;閆振興;那俊;朱志良;;一種云計(jì)算環(huán)境下的能效模型和度量方法[J];軟件學(xué)報(bào);2012年02期
本文編號(hào):2421118
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2421118.html
最近更新
教材專著