基于Hadoop的作業(yè)調(diào)度方案研究
發(fā)布時(shí)間:2018-04-03 13:13
本文選題:集群 切入點(diǎn):作業(yè)調(diào)度 出處:《東北大學(xué)》2013年碩士論文
【摘要】:近年來,隨著信息技術(shù)的進(jìn)一步發(fā)展,企業(yè)數(shù)字化進(jìn)程的不斷加深,企業(yè)需要處理的數(shù)據(jù)也出現(xiàn)了爆發(fā)式的增長(zhǎng)。為了提高企業(yè)的流程效率、盈利能力和產(chǎn)能,出現(xiàn)了一些列以云計(jì)算為代表的新技術(shù)。Hadoop是一個(gè)開源并行分布式計(jì)算平臺(tái),屬于云計(jì)算中的PaaS服務(wù)層。Hadoop中的作業(yè)調(diào)度是指將系統(tǒng)中空閑的資源按一定調(diào)度策略分配給作業(yè),調(diào)度策略的好壞關(guān)系到Hadoop集群的資源利用率、作業(yè)花費(fèi)時(shí)間和集群的性能。本文分析了Hadoop中的MapReduce和HDFS架構(gòu),對(duì)Hadoop的調(diào)度過程以及如何編寫調(diào)度器進(jìn)行了介紹。目前Hadoop平臺(tái)主要使用四種調(diào)度器,一是默認(rèn)的FIFO調(diào)度器,二是Fair調(diào)度器,三是Capacity調(diào)度器,四是推測(cè)式任務(wù)調(diào)度器,本文介紹了這幾種調(diào)度器的算法思想,在實(shí)驗(yàn)的基礎(chǔ)上比較了四種調(diào)度器的性能,并分析了這些調(diào)度器的不足。在此基礎(chǔ)上,本文給出一個(gè)作業(yè)調(diào)度方案,方案包括一個(gè)調(diào)度器和一個(gè)集群負(fù)載均衡算法,詳細(xì)介紹了算法的核心思想,算法偽代碼實(shí)現(xiàn)和方案所用的類圖。在實(shí)驗(yàn)章節(jié),通過使用java程序進(jìn)行仿真實(shí)驗(yàn),測(cè)試調(diào)度器所用的參數(shù),得到了性能較優(yōu)的參數(shù)組合。通過搭建Hadoop集群測(cè)試負(fù)載均衡算法的性能,然后在集群上部署完整的作業(yè)調(diào)度方案,分別在同構(gòu)環(huán)境和異構(gòu)環(huán)境中測(cè)試了調(diào)度方案的性能,將該方案和Hadoop原有調(diào)度器進(jìn)行了對(duì)比,實(shí)驗(yàn)結(jié)果表明該調(diào)度方案在異構(gòu)環(huán)境下,在作業(yè)的總運(yùn)行時(shí)間、平均周轉(zhuǎn)時(shí)間這兩項(xiàng)指標(biāo)上比原有調(diào)度器有更好的性能。
[Abstract]:In recent years, with the further development of information technology and the deepening of enterprise digitization process, the data that enterprises need to deal with also appear explosive growth.In order to improve enterprise process efficiency, profitability and capacity, some new technologies, such as cloud computing, are emerging. Hadoop is an open source parallel distributed computing platform.Job scheduling in PaaS service layer. Hadoop, which belongs to cloud computing, refers to the allocation of idle resources to jobs according to certain scheduling policies. The quality of scheduling policies is related to the resource utilization of Hadoop clusters, the time spent by jobs and the performance of clusters.This paper analyzes the MapReduce and HDFS architecture in Hadoop, introduces the scheduling process of Hadoop and how to write the scheduler.At present, four kinds of schedulers are mainly used in Hadoop platform, one is default FIFO scheduler, the other is Fair scheduler, three is Capacity scheduler, and the other is conjectural task scheduler.On the basis of experiments, the performance of four schedulers is compared, and the shortcomings of these schedulers are analyzed.On this basis, this paper presents a job scheduling scheme, which includes a scheduler and a cluster load balancing algorithm. The core idea of the algorithm, the implementation of the pseudo code and the class diagram used in the scheme are introduced in detail.In the chapter of experiment, the parameters of the scheduler are tested by using java program, and the parameter combination with better performance is obtained.By setting up a Hadoop cluster to test the performance of the load balancing algorithm, and then deploying a complete job scheduling scheme on the cluster, the performance of the scheduling scheme is tested in the isomorphic environment and the heterogeneous environment, respectively.Compared with the original Hadoop scheduler, the experimental results show that the scheme has better performance than the original scheduler in terms of the total running time and the average turnover time of the job in the heterogeneous environment.
【學(xué)位授予單位】:東北大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類號(hào)】:TP338.6
,
本文編號(hào):1705342
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1705342.html
最近更新
教材專著