基于性能預(yù)測的Spark資源優(yōu)化分配策略
[Abstract]:Spark has become the most popular distributed big data computing platform. Because of its high performance, good fault tolerance and unity, it has been widely used in the industry. However, because the operation of Spark platform is transparent to users, the tasks running on Spark are affected by many factors, such as data partitioning strategy, algorithm design and implementation, resource allocation of nodes and so on. This makes it very difficult to predict Spark performance. By establishing a performance model based on Spark task structure, this paper studies the execution time of Spark task under different data volume and partition strategy, and then finds out the balance between task execution time and cluster resource consumption. An optimal resource allocation strategy based on dynamic repartitioning is proposed. On the basis of fine-grained monitoring cluster resources, this paper analyzes the execution information of each stage of Spark task, establishes a performance model based on Spark task structure, and trains the parameters of the model through a large number of historical experimental data. The performance prediction of Spark computing task with different load types is realized. On this basis, we study the effect of partitioning policy on the execution time of Spark. We find that although increasing the degree of parallelism of nodes can improve the performance of computing tasks to some extent, in some cases, The performance improvement is considered to be minimal compared with the additional resource consumption, and when we have met the user's requirements for task runtime, these small performance improvements can be ignored. In order to save resources, we should reduce the allocation of resources as much as possible under the time requirement given by the user. We will find the best partitioning scheme by adding dynamic repartitioning to a series of actual Spark computing tasks and propose a repartitioning strategy based on task time prediction. On the premise of not sacrificing task running time too much, we can save cluster resources, find the balance between task execution time and cluster resource allocation, and guide users to use cluster resources reasonably for Spark tasks. The rationality of the performance model and the accuracy of the prediction of task execution time are verified by experiments in this paper. On this basis, we propose an optimal resource allocation strategy based on performance prediction, and find the optimized cluster resource allocation strategy through dynamic repartitioning in the Spark load set. To achieve a balance between task execution time and cluster resource consumption. The experimental results show that our optimization strategy can obviously save cluster resources in the execution time given by users and find a good balance between task execution time and cluster resource consumption.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP311.13
【相似文獻】
相關(guān)期刊論文 前7條
1 陶洋;黃濤;唐毅;;基于主機負(fù)載的任務(wù)執(zhí)行時間預(yù)測研究[J];計算機應(yīng)用;2009年10期
2 欒翠菊;宋廣華;鄭耀;張繼發(fā);;一種網(wǎng)格并行任務(wù)執(zhí)行時間預(yù)測算法[J];計算機集成制造系統(tǒng);2007年09期
3 韓耀軍;羅雪梅;;網(wǎng)格計算環(huán)境下任務(wù)執(zhí)行時間的組合預(yù)測[J];計算機工程;2006年21期
4 吉勤;李培峰;朱巧明;馬鋒明;;網(wǎng)格環(huán)境下基于分塊的任務(wù)執(zhí)行時間預(yù)測算法[J];計算機應(yīng)用;2009年07期
5 宋滸;李京;劉新春;;云環(huán)境中Bag-of-tasks應(yīng)用的多核虛擬計算資源分配機制研究[J];小型微型計算機系統(tǒng);2014年01期
6 張勰,龔龍慶;一種基于比特表的實時多任務(wù)新調(diào)度算法[J];單片機與嵌入式系統(tǒng)應(yīng)用;2003年09期
7 ;Evaluation of energy transfer and utilization efficiency of azo dye removal by different pulsed electrical discharge modes[J];Chinese Science Bulletin;2008年12期
相關(guān)會議論文 前1條
1 ;Study on the spark discharge plasma jet driven by nanosecond pulses[A];第十五屆全國等離子體科學(xué)技術(shù)會議會議摘要集[C];2011年
相關(guān)碩士學(xué)位論文 前10條
1 唐毅;網(wǎng)格環(huán)境中主機負(fù)載和任務(wù)執(zhí)行時間預(yù)測研究[D];重慶郵電大學(xué);2008年
2 廖志堅;基于歷史運行軌跡的時間約束參數(shù)預(yù)測的研究[D];廣東工業(yè)大學(xué);2007年
3 劉江輝;基于RT-CORBA的任務(wù)運行時間預(yù)測研究[D];廣東工業(yè)大學(xué);2005年
4 王韜;基于Spark的聚類集成系統(tǒng)研究與設(shè)計[D];西南交通大學(xué);2015年
5 陳曉康;基于Spark 云計算平臺的改進K近鄰算法研究[D];廣東工業(yè)大學(xué);2016年
6 牟善文;美國SPARK課程模式小學(xué)生體育課能量代謝特點及干預(yù)實驗研究[D];首都體育學(xué)院;2016年
7 李爭獻;基于Spark的移動終端信息推送系統(tǒng)的設(shè)計與實現(xiàn)[D];華南理工大學(xué);2016年
8 趙洋;基于spark的網(wǎng)絡(luò)廣告交易計費系統(tǒng)的設(shè)計與實現(xiàn)[D];哈爾濱工業(yè)大學(xué);2016年
9 尚勃;Spark平臺下基于深度學(xué)習(xí)的網(wǎng)絡(luò)短文本情感分類研究[D];西安建筑科技大學(xué);2016年
10 王海華;Spark數(shù)據(jù)處理平臺中內(nèi)存數(shù)據(jù)空間管理技術(shù)研究[D];北京工業(yè)大學(xué);2016年
,本文編號:2308292
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2308292.html