MapReduce框架下的任務(wù)調(diào)度算法研究
本文選題:MapReduce + Hadoop ; 參考:《南京理工大學(xué)》2017年碩士論文
【摘要】:近年來大數(shù)據(jù)計(jì)算已成為研究熱點(diǎn),Hadoop和Spark都是基于MapReduce框架的廣泛應(yīng)用的大數(shù)據(jù)計(jì)算平臺(tái),其性能主要取決于任務(wù)調(diào)度的優(yōu)劣。因此,基于MapReduce框架的Hadoop和Spark環(huán)境下任務(wù)調(diào)度算法的研究具有一定的理論價(jià)值和實(shí)際意義。本文重點(diǎn)研究:Hadoop環(huán)境下批處理作業(yè)調(diào)度算法和Spark環(huán)境下Web服務(wù)的資源分配方法。針對(duì)Hadoop環(huán)境下優(yōu)化最大完工時(shí)間的批處理作業(yè)調(diào)度問題,本文將該問題模型化為具有準(zhǔn)備時(shí)間的兩階段混合流水作業(yè)調(diào)度問題,并基于DAG(Directed Acyclic Graph)模型提出啟發(fā)式算法 DAGEA(Directed Acyclic Graph Earliest Available)和DAGEF(Directed Acyclic Graph Earliest Finish),F(xiàn)有求解具有準(zhǔn)備時(shí)間的兩階段混合流水作業(yè)調(diào)度的算法往往基于甘特圖構(gòu)造,此方法無法有效考慮各作業(yè)的可調(diào)度范圍。不同于此,DAGEA、DAGEF基于DAG構(gòu)造,通過DAG計(jì)算各作業(yè)的可調(diào)度范圍并合理調(diào)整作業(yè)的開始時(shí)間,從而有效提高算法的性能和效率。模擬實(shí)驗(yàn)驗(yàn)證了該結(jié)論。Spark計(jì)算基于內(nèi)存,而Hadoop計(jì)算基于磁盤。Spark目前資源分配考慮空余核數(shù)和內(nèi)存等大粒度資源,本文在Spark環(huán)境下Web服務(wù)資源調(diào)度增加考慮集群節(jié)點(diǎn)CPU利用率和處理能力等資源使用情況,重新評(píng)估每個(gè)節(jié)點(diǎn)資源利用率,再分配資源給任務(wù)。新的資源調(diào)度方法MEAN縮小資源粒度,從而提高集群資源利用率,增加Web請(qǐng)求處理數(shù),提高并發(fā)性。任務(wù)調(diào)度和資源分配是分布式大數(shù)據(jù)計(jì)算平臺(tái)的核心,其質(zhì)量直接決定平臺(tái)的性能。本文研究基于MapReduce框架的任務(wù)調(diào)度算法,重點(diǎn)研究Hadoop環(huán)境下批處理調(diào)度算法和Spark環(huán)境下Web服務(wù)的資源分配方法,分別提出DAGEA、DAGEF和MEAN算法,實(shí)驗(yàn)表明所提算法的有效性。
[Abstract]:In recent years, big data computing has become a hot research topic. Both Hadoop and Spark are widely used platforms based on MapReduce framework. The performance of big data computing platform mainly depends on the quality of task scheduling.Therefore, the research of task scheduling algorithm based on MapReduce framework in Hadoop and Spark environment has certain theoretical value and practical significance.This paper focuses on the task scheduling algorithm of batch processing under the environment of: Hadoop and the resource allocation method of Web service in Spark environment.Aiming at the batch scheduling problem which optimizes the maximum completion time in Hadoop environment, this paper models the problem as a two-stage mixed flow job scheduling problem with preparation time.A heuristic algorithm DAGEA(Directed Acyclic Graph Earliest available and DAGEF(Directed Acyclic Graph Earliest finish are proposed based on DAG(Directed Acyclic Graph model.The existing algorithms for solving two-stage mixed flow job scheduling with preparation time are often constructed based on Gantt graph. This method can not effectively consider the schedulable range of each job.Different from the DAG structure, the schedulable range of each job is calculated by DAG and the start time of the job is adjusted reasonably, so that the performance and efficiency of the algorithm can be improved effectively.The simulation results show that the Spark calculation is based on memory, while the Hadoop calculation is based on disk Spark's current resource allocation, which takes into account large granularity resources such as the number of spare cores and memory.In this paper, Web service resource scheduling in Spark environment takes into account the utilization of cluster nodes' CPU utilization and processing power, and reevaluates the utilization of each node's resources, and assigns the resources to the task.A new resource scheduling method, MEAN, reduces the granularity of resources, improves the utilization of cluster resources, increases the number of Web requests, and improves concurrency.Task scheduling and resource allocation are the core of the distributed big data computing platform, whose quality directly determines the performance of the platform.In this paper, the task scheduling algorithm based on MapReduce framework is studied, and the batch scheduling algorithm under Hadoop environment and the resource allocation method of Web service under Spark environment are studied. The DAGEAA DAGEF and MEAN algorithms are proposed, respectively. Experiments show that the proposed algorithm is effective.
【學(xué)位授予單位】:南京理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP393.09;TP311.13
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 祿樂濱,劉明東;一種基于函數(shù)的多任務(wù)調(diào)度算法[J];空軍工程大學(xué)學(xué)報(bào)(自然科學(xué)版);2000年02期
2 阮幼林 ,劉干 ,朱光喜 ,盧小峰;一個(gè)基于復(fù)制的相關(guān)任務(wù)調(diào)度算法[J];小型微型計(jì)算機(jī)系統(tǒng);2005年03期
3 楊斌;張建軍;;一個(gè)新的基于通信競爭的任務(wù)調(diào)度算法[J];計(jì)算機(jī)工程與應(yīng)用;2007年33期
4 胡同福;王文生;謝能付;;設(shè)備網(wǎng)格中的任務(wù)調(diào)度算法[J];計(jì)算機(jī)工程與設(shè)計(jì);2008年12期
5 周艷慧;張凱;;新的分布式任務(wù)調(diào)度算法[J];計(jì)算機(jī)系統(tǒng)應(yīng)用;2008年10期
6 薛繼偉;姜波;劉慶強(qiáng);王征;;基于能力感知的人機(jī)任務(wù)調(diào)度算法[J];計(jì)算機(jī)工程;2009年19期
7 曹曉磊;程?hào)|年;黃萬偉;;基于離散時(shí)間距的在線可重構(gòu)任務(wù)調(diào)度算法[J];小型微型計(jì)算機(jī)系統(tǒng);2010年10期
8 韓曉亞;汪斌強(qiáng);黃萬偉;王保進(jìn);;采用配置完成優(yōu)先策略的可重構(gòu)任務(wù)調(diào)度算法[J];小型微型計(jì)算機(jī)系統(tǒng);2012年03期
9 楊麗;武小年;商可e,
本文編號(hào):1758417
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/1758417.html