基于Hadoop系統(tǒng)的自學(xué)習(xí)資源調(diào)度器模型研究
本文關(guān)鍵詞: Hadoop 資源調(diào)度 自學(xué)習(xí) MapReduce 作業(yè) 出處:《華中科技大學(xué)》2016年碩士論文 論文類型:學(xué)位論文
【摘要】:隨著信息大爆炸時(shí)代的來(lái)臨,云計(jì)算和大數(shù)據(jù)技術(shù)應(yīng)運(yùn)而生。Hadoop是一個(gè)支持分布式集群使用簡(jiǎn)單的MapReduce編程模型處理大規(guī)模數(shù)據(jù)集的框架。當(dāng)集群規(guī)模不斷增長(zhǎng)時(shí),如何提高集群的資源利用率、縮短任務(wù)的響應(yīng)時(shí)間,優(yōu)化Hadoop的資源調(diào)度器,提高集群的效率,已成為當(dāng)前云計(jì)算領(lǐng)域的研究熱點(diǎn)。本文結(jié)合國(guó)內(nèi)外的研究現(xiàn)狀,在比較了Hadoop系統(tǒng)常見(jiàn)的幾種資源調(diào)度器的基礎(chǔ)上,改進(jìn)了一種基于作業(yè)分類的自學(xué)習(xí)資源調(diào)度器模型,以期提高異構(gòu)Hadoop集群的資源利用率,同時(shí)縮短作業(yè)的執(zhí)行時(shí)間。本文主要完成了以下研究?jī)?nèi)容:第一,介紹了Hadoop系統(tǒng)的發(fā)展史和國(guó)內(nèi)外有關(guān)Hadoop資源調(diào)度器的研究現(xiàn)狀。第二,闡釋了Hadoop系統(tǒng)的兩大核心——分布式文件系統(tǒng)HDFS和并行編程模型MapReduce的原理。第三,詳細(xì)分析了當(dāng)前Hadoop系統(tǒng)的三種資源調(diào)度器:FIFO、Capacity Scheduler和Fair Scheduler,解釋了它們的實(shí)現(xiàn)原理,分析了各自的優(yōu)缺點(diǎn)和適用場(chǎng)景。第四,對(duì)Hadoop系統(tǒng)進(jìn)行建模,每一個(gè)節(jié)點(diǎn)的資源可以抽象為虛擬核和內(nèi)存,虛擬核有一個(gè)執(zhí)行速率的屬性,內(nèi)存有兩個(gè)屬性,分別是大小和數(shù)據(jù)到達(dá)速率。系統(tǒng)有三個(gè)性能評(píng)價(jià)指標(biāo):本地特性、作業(yè)平均完成時(shí)間、公平性。第五,改進(jìn)自學(xué)習(xí)資源調(diào)度器模型,并用實(shí)驗(yàn)驗(yàn)證。首先構(gòu)建作業(yè)分類器,每類作業(yè)都有一個(gè)隊(duì)列與之對(duì)應(yīng)。當(dāng)作業(yè)到來(lái)時(shí),把作業(yè)加入相應(yīng)類別的隊(duì)列。自學(xué)習(xí)資源調(diào)度器在后臺(tái)維護(hù)一個(gè)各類作業(yè)資源需求量的配額表,調(diào)度器根據(jù)歷史統(tǒng)計(jì)數(shù)據(jù)采用特定的資源實(shí)時(shí)動(dòng)態(tài)分配策略,定期更新配額表,形成正反饋調(diào)節(jié)。在實(shí)驗(yàn)階段,選擇單詞統(tǒng)計(jì)、排序和矩陣相乘這三類作業(yè)做對(duì)比實(shí)驗(yàn),比較了Hadoop系統(tǒng)分別使用FIFO調(diào)度器、計(jì)算能力調(diào)度器和自學(xué)習(xí)資源調(diào)度器三種情況下作業(yè)的完成時(shí)間、集群的CPU使用率和內(nèi)存利用率,從而得出結(jié)論:自學(xué)習(xí)資源調(diào)度器在處理Reduce階段計(jì)算量小且耗時(shí)短的作業(yè)和磁盤(pán)IO次數(shù)少、計(jì)算密集型的作業(yè)方面,能夠顯著地縮短作業(yè)完成時(shí)間并且提高集群的資源利用率。
[Abstract]:With the advent of the information explosion era, cloud computing and big data technology came into being. Hadoop is a framework that supports distributed clusters to process large data sets using a simple MapReduce programming model. How to improve cluster resource utilization, shorten task response time, optimize Hadoop resource scheduler and improve cluster efficiency has become the research hotspot in the field of cloud computing. On the basis of comparing several common resource schedulers in Hadoop system, a self-learning resource scheduler model based on job classification is improved to improve the resource utilization of heterogeneous Hadoop cluster. At the same time, the execution time of jobs is shortened. This paper mainly completes the following research contents: firstly, it introduces the history of Hadoop system and the research status of Hadoop resource scheduler at home and abroad. The principle of distributed file system (HDFS) and parallel programming model (MapReduce) are explained in this paper. Thirdly, three kinds of resource schedulers:: FIFO capacity Scheduler and Fair Scheduler of current Hadoop system are analyzed in detail, and their implementation principles are explained. This paper analyzes their advantages and disadvantages and applicable scenarios. 4th, the Hadoop system is modeled. The resources of each node can be abstracted into virtual core and memory. The virtual core has an attribute of execution rate, and memory has two attributes. The system has three performance evaluation indexes: local property, average job completion time, fairness. 5th. The model of self-learning resource scheduler is improved and verified by experiments. Firstly, a job classifier is constructed. Each type of job has a queue corresponding to it. When the job arrives, add the job to the queue of the corresponding class. The self-learning resource scheduler maintains a quota table for the requirements of each job resource in the background. According to the historical statistics, the scheduler adopts a specific real-time dynamic allocation strategy of resources, and periodically updates the quota table to form positive feedback adjustment. In the experiment stage, three kinds of jobs, namely, word statistics, sorting and matrix multiplication, are selected to do comparative experiments. In this paper, we compare the job completion time, CPU usage and memory utilization of Hadoop system using FIFO scheduler, computing power scheduler and self-learning resource scheduler, respectively. It is concluded that the self-learning resource scheduler can significantly shorten the job completion time and improve the resource utilization of the cluster in processing Reduce with less computation time and less disk IO times and computation-intensive jobs.
【學(xué)位授予單位】:華中科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2016
【分類號(hào)】:TP311.13
【參考文獻(xiàn)】
相關(guān)期刊論文 前8條
1 董春濤;李文婷;沈晴霓;吳中海;;Hadoop YARN大數(shù)據(jù)計(jì)算框架及其資源調(diào)度機(jī)制研究[J];信息通信技術(shù);2015年01期
2 ;陳光:大數(shù)據(jù)分析應(yīng)用將對(duì)云計(jì)算產(chǎn)生巨大需求[J];計(jì)算機(jī)光盤(pán)軟件與應(yīng)用;2014年18期
3 方巍;文學(xué)志;潘吳斌;薛勝軍;;云計(jì)算:概念、技術(shù)及應(yīng)用研究綜述[J];南京信息工程大學(xué)學(xué)報(bào)(自然科學(xué)版);2012年04期
4 李建江;崔健;王聃;嚴(yán)林;黃義雙;;MapReduce并行編程模型研究綜述[J];電子學(xué)報(bào);2011年11期
5 劉陽(yáng)成;周儉;謝玉波;;海量數(shù)據(jù)存儲(chǔ)管理技術(shù)研究[J];微計(jì)算機(jī)應(yīng)用;2011年10期
6 張耀祥;;云計(jì)算和虛擬化技術(shù)[J];計(jì)算機(jī)安全;2011年05期
7 徐風(fēng);王偉平;;基于綜合形式(PAAS+IAAS)的云計(jì)算平臺(tái)的研究與構(gòu)建[J];科技資訊;2010年32期
8 張忠文;王世暉;;求解線性規(guī)劃問(wèn)題最優(yōu)解時(shí)常遇到的幾種特殊情況[J];甘肅聯(lián)合大學(xué)學(xué)報(bào)(自然科學(xué)版);2010年03期
相關(guān)博士學(xué)位論文 前2條
1 李冰;云計(jì)算環(huán)境下動(dòng)態(tài)資源管理關(guān)鍵技術(shù)研究[D];北京郵電大學(xué);2012年
2 史恒亮;云計(jì)算任務(wù)調(diào)度研究[D];南京理工大學(xué);2012年
相關(guān)碩士學(xué)位論文 前5條
1 項(xiàng)明;Hadoop集群系統(tǒng)性能優(yōu)化的研究[D];遼寧師范大學(xué);2013年
2 馬寶來(lái);Hadoop平臺(tái)任務(wù)調(diào)度算法的研究與改進(jìn)[D];東北大學(xué);2012年
3 付東華;基于HDFS的海量分布式文件系統(tǒng)的研究與優(yōu)化[D];北京郵電大學(xué);2012年
4 黑繼偉;基于分布式并行文件系統(tǒng)HDFS的副本管理模型[D];吉林大學(xué);2010年
5 史岳鵬;分布式計(jì)算系統(tǒng)關(guān)鍵技術(shù)研究[D];解放軍信息工程大學(xué);2008年
,本文編號(hào):1505440
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1505440.html