當(dāng)前位置：主頁 > 管理論文 > 移動網(wǎng)絡(luò)論文 >

基于能量感知的Hadoop平臺調(diào)度器研究

發(fā)布時間：2019-02-13 03:21

【摘要】：現(xiàn)如今各行各業(yè)的數(shù)據(jù)每天都在快速增加，學(xué)術(shù)界和企業(yè)發(fā)現(xiàn)這些數(shù)據(jù)中隱藏著巨大的價值。在這種需求下各種數(shù)據(jù)分析框架和平臺發(fā)展起來，其中Hadoop是目前最流行的開源平臺，該平臺實現(xiàn)了Google提出的MAPREDUCE計算模型和GFS存儲模型。近年來不斷積累的溫室氣體正在改變?nèi)驓夂颍瑪?shù)據(jù)中心的建設(shè)也應(yīng)該把低碳減排放在重要的位置；同時企業(yè)在數(shù)據(jù)中心電能方面的投入也越來越多。目前Hadoop集群中的主機數(shù)目正在不斷的增加，數(shù)據(jù)中心能耗控制問題也越來越突出。因此從Hadoop平臺方面研究如何減少Hadoop集群的能量消耗對于環(huán)境保護和減少企業(yè)成本具有重要的意義。結(jié)合Hadoop平臺的工作原理以及MapReduce計算框架運行時環(huán)境的架構(gòu)，本文確定了從資源與任務(wù)調(diào)度的角度在Hadoop平臺中構(gòu)建一套能量消耗控制的體系結(jié)構(gòu)。使用先進先出算法的單隊列調(diào)度器（FIFO Scheduler）和基于計算能力算法的調(diào)度器（Capacity Scheduler）是平臺自帶的兩種常用調(diào)度器，通過對它們的測試和分析，總結(jié)出了這兩種調(diào)度器對于構(gòu)建Hadoop平臺能量控制框架的缺陷和不足�；谠姓{(diào)度器的不足本文設(shè)計并實現(xiàn)了基于能量感知的Hadoop平臺調(diào)度器，該調(diào)度器中構(gòu)建了一套能量控制的框架，并設(shè)計了兩層調(diào)度策略來進行作業(yè)到資源的節(jié)能調(diào)度。本文設(shè)計的基于能量感知的Hadoop平臺調(diào)度器具有以下兩個特點：1）調(diào)度器可以調(diào)節(jié)和平衡Hadoop集群作業(yè)運行過程中的Qos和總能耗；2）調(diào)度器本身具有高效的調(diào)度策略。調(diào)度器的整體框架是基于多隊列設(shè)計的，設(shè)計了兩層調(diào)度策略來完成作業(yè)的任務(wù)到計算資源之間的動態(tài)節(jié)能匹配，兩層調(diào)度策略具有高效性，并且時間復(fù)雜度是線性的；多隊列中作業(yè)的分配使用了類似一致性hash的方法，，保證了作業(yè)到隊列的高效動態(tài)分配以及系統(tǒng)的高并發(fā)性。本文最后使用XCP（xen cloud platform）云平臺構(gòu)建了具有32臺虛擬機的Hadoop集群環(huán)境。并在該集群環(huán)境中將本文設(shè)計的節(jié)能調(diào)度器與Hadoop平臺自帶的先進先出調(diào)度器和計算能力調(diào)度器進行了對比實驗，實驗對比的目標(biāo)是在不同的作業(yè)輸入情況下Hadoop集群使用不同的調(diào)度器時，作業(yè)運行總能量消耗和時間消耗兩方面的性能；另一方面是對比本文設(shè)計的節(jié)能調(diào)度器自身在控制作業(yè)運行能耗和時間消耗的能力。實驗結(jié)果表明本文設(shè)計的節(jié)能調(diào)度器具有較好的能量控制能力，同時不增加集群作業(yè)運行的時間消耗；本文設(shè)計的節(jié)能調(diào)度器在作業(yè)運行時間和能耗兩方面也具有較好的調(diào)節(jié)能力。
[Abstract]:Today, data from various industries are increasing rapidly every day, and academics and businesses find great value hidden in them. Under this requirement, a variety of data analysis frameworks and platforms have been developed, among which Hadoop is the most popular open source platform. The platform implements the MAPREDUCE computing model and GFS storage model proposed by Google. The accumulation of greenhouse gases in recent years is changing the global climate, the construction of data centers should also put low carbon emissions reduction in the important position, and enterprises in the data center electricity investment is also increasing. At present, the number of hosts in Hadoop cluster is increasing, and the problem of data center energy consumption control is becoming more and more prominent. Therefore, it is of great significance to study how to reduce the energy consumption of Hadoop cluster from the aspect of Hadoop platform for environmental protection and enterprise cost reduction. Combined with the working principle of Hadoop platform and the framework of runtime environment of MapReduce computing framework, this paper establishes a set of energy consumption control architecture in Hadoop platform from the point of view of resource and task scheduling. Single queue scheduler (FIFO Scheduler) using first-in first-out algorithm and (Capacity Scheduler) scheduler based on computing power algorithm are two kinds of common schedulers that come with the platform. The defects and shortcomings of these two schedulers for building the energy control framework of Hadoop platform are summarized. Based on the shortcomings of the original scheduler, this paper designs and implements an energy-aware Hadoop platform scheduler. In this scheduler, a set of energy control framework is constructed, and a two-layer scheduling strategy is designed to carry out the energy saving scheduling from the job to the resource. The energy aware Hadoop platform scheduler designed in this paper has the following two characteristics: 1) the scheduler can adjust and balance the Qos and total energy consumption in the running of Hadoop cluster jobs; 2) the scheduler itself has an efficient scheduling strategy. The overall framework of the scheduler is based on the multi-queue design. A two-layer scheduling strategy is designed to complete the task of the job to the dynamic energy saving matching between the computing resources. The two-layer scheduling strategy is efficient and the time complexity is linear. The method of similar consistency hash is used in the assignment of jobs in multiple queues, which ensures the efficient dynamic assignment of jobs to queues and the high concurrency of the system. In the end, the Hadoop cluster environment with 32 virtual machines is constructed by using XCP (xen cloud platform) cloud platform. In this cluster environment, the energy saving scheduler designed in this paper is compared with the first-in-first-out scheduler and the computing power scheduler that comes with Hadoop platform. The objective of the experiment is to compare the performance of the total energy consumption and time consumption of the job when the Hadoop cluster uses different schedulers under different job input conditions. On the other hand, it compares the energy consumption and time consumption of the energy saving scheduler designed in this paper. The experimental results show that the energy-saving scheduling device designed in this paper has better energy control ability and does not increase the time consumption of cluster operation. The energy-saving scheduler designed in this paper also has better regulating ability in terms of job running time and energy consumption.
【學(xué)位授予單位】：哈爾濱工業(yè)大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2014
【分類號】：TP393.09

【參考文獻】

相關(guān)期刊論文前2條

1 翟健宏;李偉;葛瑞海;楊茹;;基于聚類與貝葉斯分類器的網(wǎng)絡(luò)節(jié)點分組算法及評價模型[J];電信科學(xué);2013年02期

2 宋杰;李甜甜;閆振興;那俊;朱志良;;一種云計算環(huán)境下的能效模型和度量方法[J];軟件學(xué)報;2012年02期

本文編號：2421118

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/guanlilunwen/ydhl/2421118.html

上一篇：一種基于P2P的統(tǒng)一身份服務(wù)網(wǎng)絡(luò)模型
下一篇：基于FPGA的嵌入式千兆以太網(wǎng)接口設(shè)計

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于能量感知的Hadoop平臺調(diào)度器研究