基于SLA的MapReduce調(diào)度機(jī)制研究
發(fā)布時(shí)間:2019-03-09 14:24
【摘要】:MapReduce作為一種有效的數(shù)據(jù)分析和處理解決方案已被廣泛應(yīng)用于大規(guī)模數(shù)據(jù)處理領(lǐng)域。隨著MapReduce應(yīng)用的擴(kuò)大,越來(lái)越多的服務(wù)提供商對(duì)外提供MapReduce商業(yè)服務(wù)。服務(wù)提供商通過(guò)運(yùn)行MapReduce作業(yè)實(shí)現(xiàn)一系列業(yè)務(wù)邏輯,最終將數(shù)據(jù)分析和處理結(jié)果返回給用戶(hù)。為保證雙方權(quán)益,用戶(hù)與服務(wù)提供商之間簽訂服務(wù)水平協(xié)議(SLA),服務(wù)提供商必須遵循SLA,滿足作業(yè)響應(yīng)時(shí)間等性能需求,否則可能收到違約處罰。因此,如何有效的進(jìn)行作業(yè)及任務(wù)調(diào)度以滿足用戶(hù)的SLA已成為服務(wù)提供商關(guān)注的問(wèn)題。 SLA的差異性和集群的共享性為解決這一問(wèn)題帶來(lái)了諸多挑戰(zhàn)。1)用戶(hù)需求的不同,造成了作業(yè)類(lèi)型的多樣化,集群中可能同時(shí)運(yùn)行著即席查詢(xún)作業(yè),處理生產(chǎn)型的大作業(yè)、機(jī)器學(xué)習(xí)型作業(yè)等,即使處理同一數(shù)據(jù)集,也可能出現(xiàn)短交互式作業(yè)和長(zhǎng)批量作業(yè)混雜的復(fù)雜場(chǎng)景,相應(yīng)地,用戶(hù)對(duì)SLA中作業(yè)響應(yīng)時(shí)間也就有著迥然相異的要求。2)服務(wù)提供商為節(jié)約構(gòu)建獨(dú)立集群和跨集群數(shù)據(jù)復(fù)制帶來(lái)的網(wǎng)絡(luò)和存儲(chǔ)成本,使得MapReduce集群在多用戶(hù)群組間共享,但同時(shí)也造成作業(yè)性能容易受到其他并發(fā)作業(yè)的影響,給滿足用戶(hù)的SLA增添了挑戰(zhàn)。 現(xiàn)有的MapReduce調(diào)度機(jī)制重點(diǎn)關(guān)注集群資源在用戶(hù)間的公平共享,或者通過(guò)基于優(yōu)先級(jí)的策略進(jìn)行資源分配和調(diào)度。但是這些調(diào)度機(jī)制缺乏對(duì)用戶(hù)SLA的感知,作業(yè)優(yōu)先級(jí)難以體現(xiàn)用戶(hù)SLA具體的差異,粒度過(guò)大,無(wú)法建立優(yōu)先級(jí)和用戶(hù)SLA間準(zhǔn)確的映射關(guān)系。同時(shí),還缺乏對(duì)集群運(yùn)行狀態(tài)和作業(yè)執(zhí)行狀態(tài)動(dòng)態(tài)變化的感知,從而無(wú)法準(zhǔn)確而有效的滿足用戶(hù)的SLA。 針對(duì)上述問(wèn)題和挑戰(zhàn),本文從作業(yè)性能模型構(gòu)建、作業(yè)級(jí)調(diào)度和任務(wù)級(jí)調(diào)度優(yōu)化等幾個(gè)方面著手,提出了基于SLA的MapReduce調(diào)度機(jī)制。本文的主要工作和成果包括: 1.提出基于SLA的MapReduce調(diào)度架構(gòu),引入可插拔的調(diào)度支持節(jié)點(diǎn),從作業(yè)級(jí)和任務(wù)級(jí)兩個(gè)層次對(duì)用戶(hù)的SLA提供靈活支持,并給出了該架構(gòu)下動(dòng)態(tài)自適應(yīng)的作業(yè)性能模型,該模型基于歷史記錄、集群和作業(yè)運(yùn)行狀態(tài),準(zhǔn)確地預(yù)測(cè)和判斷是否可能出現(xiàn)SLA作業(yè)響應(yīng)時(shí)間上限違例的情況。 2.針對(duì)用戶(hù)SLA的差異性,結(jié)合作業(yè)性能模型,提出基于SLA的兩階段作業(yè)調(diào)度機(jī)制,該機(jī)制預(yù)測(cè)滿足用戶(hù)SLA所需的最小資源量以及作業(yè)預(yù)期邊際收益,據(jù)此實(shí)現(xiàn)集群資源劃分,進(jìn)行作業(yè)調(diào)度以最大限度地滿足用戶(hù)的SLA,避免集群閑置資源的盲目分配,并提高服務(wù)提供商可能獲得的全局收益。 3.在作業(yè)級(jí)調(diào)度策略的基礎(chǔ)上,提出感知數(shù)據(jù)分布的任務(wù)分配優(yōu)化機(jī)制,盡可能減少組成作業(yè)的若干任務(wù)執(zhí)行過(guò)程中的數(shù)據(jù)移動(dòng)代價(jià),從而通過(guò)架構(gòu)反饋回路,提高執(zhí)行效率,縮短作業(yè)響應(yīng)時(shí)間,優(yōu)化SLA滿足率。該機(jī)制以感知數(shù)據(jù)分布為核心思想,根據(jù)map任務(wù)和reduce任務(wù)輸入數(shù)據(jù)分布的不同特點(diǎn),分別以任務(wù)的本地調(diào)度權(quán)重和數(shù)據(jù)傳輸代價(jià)為依據(jù),基于貪婪思想實(shí)現(xiàn)有效的任務(wù)分配。 4.從作業(yè)性能模型準(zhǔn)確度,作業(yè)級(jí)調(diào)度策略對(duì)用戶(hù)SLA滿足的有效性和任務(wù)級(jí)分配優(yōu)化對(duì)任務(wù)執(zhí)行效率提升程度幾個(gè)方面進(jìn)行實(shí)驗(yàn)評(píng)估,驗(yàn)證了本文工作的可行性和有效性。
[Abstract]:MapReduce, as an effective data analysis and processing solution, has been widely used in the field of large-scale data processing. With the expansion of MapReduce application, more and more service providers offer MapReduce business services to the outside. The service provider implements a series of business logic by running the MapReduce job, and finally returns the data analysis and processing results to the user. In order to ensure the rights and interests of both parties, the service provider must follow the SLA to meet the performance requirements such as the operation response time and other performance requirements, otherwise the default penalty may be received. Therefore, how to effectively carry out the operation and task scheduling to meet the user's SLA has become a concern of the service provider. The difference of the SLA and the sharing of the cluster have brought many challenges to the solution of this problem.1) The difference of the user's needs, resulting in the diversification of the job type, can run the Ad Hoc Query Job at the same time in the cluster, and handle the large-scale operation of the production type and the learning-type operation of the machine and the like, even if the same data set is processed, a complex scene with a short interactive operation and a long batch job mixing may occur, and accordingly, in addition, that us has a very different requirement for the time of the job response in the SLA.) the service provider saves the network and storage costs associated with the construction of the independent cluster and the cross-cluster data replication, so that the MapReduce cluster co-operates among the multi-user groups But at the same time, the operation performance is easily influenced by other concurrent operations, and the SLA of the user is added. The existing MapReduce scheduling mechanism focuses on the fair sharing of cluster resources among users, or the allocation of resources through priority-based policies and the task priority is difficult to reflect the specific difference of the user SLA, the granularity is too large, the priority can not be established, and the accurate mapping between the user SLA can not be established in addition, the invention also lacks the perception of the dynamic change of the running state of the cluster and the execution state of the operation, so that the user can not be satisfied accurately and effectively In view of the above problems and challenges, this paper starts from the aspects of job performance model construction, job-level scheduling and task-level scheduling optimization, and puts forward the SLA-based MapReduce e-scheduling mechanism. The main work of this paper And the dynamic self-adaptation under the framework is given. a job performance model that accurately predicts and determines whether an SLA job response time may occur based on a history, a cluster, and a job run state 2. According to the difference of SLA, a two-stage job scheduling mechanism based on SLA is proposed, which is used to predict the minimum amount of resources required to meet the SLA and the expected marginal revenue of the operation. The method realizes the cluster resource division, performs job scheduling to meet the SLA of the user to the maximum extent, avoids the blind distribution of the cluster idle resources, and improves the service provider 3. Based on the job-level scheduling strategy, a task allocation optimization mechanism for sensing data distribution is proposed to minimize the cost of data movement in the execution of several tasks that make up the job, so as to improve the performance through the architecture feedback loop Efficiency, shortened job response time, the SLA satisfaction rate is optimized. The mechanism uses the perceived data distribution as the core idea, and according to the different characteristics of the data distribution of the map task and the reduce task, based on the local scheduling weight of the task and the data transmission cost, the mechanism is based on the greed, The effective task assignment is realized by the idea.4. From the accuracy of the job performance model, the job-level scheduling strategy is used to evaluate the effectiveness and the task-level allocation optimization of the user's SLA, and to verify the efficiency of the task execution efficiency.
【學(xué)位授予單位】:山東大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類(lèi)號(hào)】:TP393.09
本文編號(hào):2437547
[Abstract]:MapReduce, as an effective data analysis and processing solution, has been widely used in the field of large-scale data processing. With the expansion of MapReduce application, more and more service providers offer MapReduce business services to the outside. The service provider implements a series of business logic by running the MapReduce job, and finally returns the data analysis and processing results to the user. In order to ensure the rights and interests of both parties, the service provider must follow the SLA to meet the performance requirements such as the operation response time and other performance requirements, otherwise the default penalty may be received. Therefore, how to effectively carry out the operation and task scheduling to meet the user's SLA has become a concern of the service provider. The difference of the SLA and the sharing of the cluster have brought many challenges to the solution of this problem.1) The difference of the user's needs, resulting in the diversification of the job type, can run the Ad Hoc Query Job at the same time in the cluster, and handle the large-scale operation of the production type and the learning-type operation of the machine and the like, even if the same data set is processed, a complex scene with a short interactive operation and a long batch job mixing may occur, and accordingly, in addition, that us has a very different requirement for the time of the job response in the SLA.) the service provider saves the network and storage costs associated with the construction of the independent cluster and the cross-cluster data replication, so that the MapReduce cluster co-operates among the multi-user groups But at the same time, the operation performance is easily influenced by other concurrent operations, and the SLA of the user is added. The existing MapReduce scheduling mechanism focuses on the fair sharing of cluster resources among users, or the allocation of resources through priority-based policies and the task priority is difficult to reflect the specific difference of the user SLA, the granularity is too large, the priority can not be established, and the accurate mapping between the user SLA can not be established in addition, the invention also lacks the perception of the dynamic change of the running state of the cluster and the execution state of the operation, so that the user can not be satisfied accurately and effectively In view of the above problems and challenges, this paper starts from the aspects of job performance model construction, job-level scheduling and task-level scheduling optimization, and puts forward the SLA-based MapReduce e-scheduling mechanism. The main work of this paper And the dynamic self-adaptation under the framework is given. a job performance model that accurately predicts and determines whether an SLA job response time may occur based on a history, a cluster, and a job run state 2. According to the difference of SLA, a two-stage job scheduling mechanism based on SLA is proposed, which is used to predict the minimum amount of resources required to meet the SLA and the expected marginal revenue of the operation. The method realizes the cluster resource division, performs job scheduling to meet the SLA of the user to the maximum extent, avoids the blind distribution of the cluster idle resources, and improves the service provider 3. Based on the job-level scheduling strategy, a task allocation optimization mechanism for sensing data distribution is proposed to minimize the cost of data movement in the execution of several tasks that make up the job, so as to improve the performance through the architecture feedback loop Efficiency, shortened job response time, the SLA satisfaction rate is optimized. The mechanism uses the perceived data distribution as the core idea, and according to the different characteristics of the data distribution of the map task and the reduce task, based on the local scheduling weight of the task and the data transmission cost, the mechanism is based on the greed, The effective task assignment is realized by the idea.4. From the accuracy of the job performance model, the job-level scheduling strategy is used to evaluate the effectiveness and the task-level allocation optimization of the user's SLA, and to verify the efficiency of the task execution efficiency.
【學(xué)位授予單位】:山東大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類(lèi)號(hào)】:TP393.09
【參考文獻(xiàn)】
相關(guān)期刊論文 前1條
1 鄭曉薇;項(xiàng)明;張大為;劉青昆;;基于節(jié)點(diǎn)能力的Hadoop集群任務(wù)自適應(yīng)調(diào)度方法[J];計(jì)算機(jī)研究與發(fā)展;2014年03期
,本文編號(hào):2437547
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2437547.html
最近更新
教材專(zhuān)著