基于迭代填充的內(nèi)存計算框架分區(qū)映射算法
發(fā)布時間:2018-08-10 22:33
【摘要】:針對內(nèi)存計算框架Spark在作業(yè)Shuffle階段一次分區(qū)產(chǎn)生的數(shù)據(jù)傾斜問題,提出一種內(nèi)存計算框架的迭代填充分區(qū)映射算法(IFPM)。首先,分析Spark作業(yè)的執(zhí)行機制,建立作業(yè)效率模型和分區(qū)映射模型,給出作業(yè)執(zhí)行時間和分配傾斜度的定義,證明這些定義與作業(yè)執(zhí)行效率的因果邏輯關(guān)系;然后,根據(jù)模型和定義求解,設(shè)計擴(kuò)展式數(shù)據(jù)分區(qū)算法(EPA)和迭代式分區(qū)映射算法(IMA),在Map端建立一對多分區(qū)函數(shù),并通過分區(qū)函數(shù)將部分?jǐn)?shù)據(jù)填入擴(kuò)展區(qū)內(nèi),在數(shù)據(jù)分布局部感知后再執(zhí)行擴(kuò)展區(qū)迭代式的多輪數(shù)據(jù)分配,根據(jù)Reduce端已分配數(shù)據(jù)量建立適應(yīng)性的擴(kuò)展區(qū)映射規(guī)則,對原生區(qū)的數(shù)據(jù)傾斜進(jìn)行逐步修正,以此保障數(shù)據(jù)分配的均衡性。實驗結(jié)果表明,在不同源數(shù)據(jù)分布條件下,算法均提高了作業(yè)Shuffle過程分區(qū)映射合理性,縮減了寬依賴Stage的同步時間,提高了作業(yè)執(zhí)行效率。
[Abstract]:Aiming at the problem of data skew caused by the primary partition of memory computing framework (Spark) in the stage of job Shuffle, an iterative padding partition mapping algorithm (IFPM).) for memory computing framework is proposed. Firstly, the execution mechanism of Spark jobs is analyzed, the job efficiency model and partition mapping model are established, the definitions of job execution time and assignment inclination are given, and the causal logic relationship between these definitions and job execution efficiency is proved. According to the model and definition, the extended data partition algorithm (EPA) and the iterative partition mapping algorithm (IMA),) are designed to establish one-to-many partition functions at the Map end. After the local perception of the data distribution, the extended region iterative multi-round data allocation is performed. According to the amount of data allocated on the Reduce terminal, the adaptive extended region mapping rules are established, and the data tilt of the native area is modified step by step. In order to ensure the balance of data distribution. The experimental results show that the algorithm improves the rationality of job Shuffle process partition mapping, reduces the synchronization time of wide dependent Stage, and improves the efficiency of job execution under the condition of different data distribution.
【作者單位】: 新疆大學(xué)信息科學(xué)與工程學(xué)院;
【基金】:國家自然科學(xué)基金資助項目(61262088,61462079,61363083,61562086) 新疆維吾爾自治區(qū)高?蒲杏媱濏椖(XJEDU2016S106)~~
【分類號】:TP333;TP301.6
[Abstract]:Aiming at the problem of data skew caused by the primary partition of memory computing framework (Spark) in the stage of job Shuffle, an iterative padding partition mapping algorithm (IFPM).) for memory computing framework is proposed. Firstly, the execution mechanism of Spark jobs is analyzed, the job efficiency model and partition mapping model are established, the definitions of job execution time and assignment inclination are given, and the causal logic relationship between these definitions and job execution efficiency is proved. According to the model and definition, the extended data partition algorithm (EPA) and the iterative partition mapping algorithm (IMA),) are designed to establish one-to-many partition functions at the Map end. After the local perception of the data distribution, the extended region iterative multi-round data allocation is performed. According to the amount of data allocated on the Reduce terminal, the adaptive extended region mapping rules are established, and the data tilt of the native area is modified step by step. In order to ensure the balance of data distribution. The experimental results show that the algorithm improves the rationality of job Shuffle process partition mapping, reduces the synchronization time of wide dependent Stage, and improves the efficiency of job execution under the condition of different data distribution.
【作者單位】: 新疆大學(xué)信息科學(xué)與工程學(xué)院;
【基金】:國家自然科學(xué)基金資助項目(61262088,61462079,61363083,61562086) 新疆維吾爾自治區(qū)高?蒲杏媱濏椖(XJEDU2016S106)~~
【分類號】:TP333;TP301.6
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 徐紅波;;空間填充曲線映射算法研究[J];科技信息(科學(xué)教研);2007年35期
2 孫培展;袁國良;;改進(jìn)的隱式空間映射算法的研究[J];電子設(shè)計工程;2012年09期
3 趙文慶;基于性能驅(qū)動的工藝映射算法[J];計算機輔助設(shè)計與圖形學(xué)學(xué)報;1992年03期
4 黎洪松;;一種改進(jìn)的自組織特征映射算法[J];中國民航學(xué)院學(xué)報;2006年01期
5 徐德智;黃利輝;陳建二;;一種新的基于樹分割的本體映射算法[J];小型微型計算機系統(tǒng);2009年11期
6 吳國福;竇強;竇文華;;基于查表的空間填充曲線映射算法[J];國防科技大學(xué)學(xué)報;2010年05期
7 陳];;心動陣列的自動映射算法[J];計算機研究與發(fā)展;1992年05期
8 黃勝;吳川川;楊曉非;王輝;張衛(wèi);;一種基于臨近原則的虛擬網(wǎng)絡(luò)映射算法[J];電信科學(xué);2013年12期
9 柳玉起;李明林;馮少宏;易國鋒;;基于有限元映射算法的試驗網(wǎng)格顯示及其應(yīng)用[J];華中科技大學(xué)學(xué)報(自然科學(xué)版);2007年03期
10 王琳珠;單_,
本文編號:2176416
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2176416.html
最近更新
教材專著