MALK:一種高效處理大規(guī)模鍵值的MapReduce框架
發(fā)布時間:2018-11-09 21:24
【摘要】:內(nèi)存申請是引發(fā)共享存儲系統(tǒng)上MapReduce性能下降的主要瓶頸之一,特別是對于需要處理大量鍵值的應用尤為嚴重.為了解決此問題,提出了一種內(nèi)存開銷低、能高效處理大規(guī)模鍵值的MapReduce并行計算框架——MALK(high-efficient MapReduce for applications having large amount of keys).MALK對于離散的大規(guī)模鍵值采用連續(xù)的存儲管理方法,避免了大量小塊內(nèi)存的申請;通過更細粒度地處理Map階段的任務和流水化Reduce階段的任務,來減少系統(tǒng)運行過程中同時活躍的數(shù)據(jù)量,從而將應用程序?qū)?nèi)存的需求控制在一個較小的范圍內(nèi);并提出一種Hash表的復用機制,通過復用Hash表的存儲空間來避免流水過程中Hash表內(nèi)存的重復申請;MALK還綜合考慮了任務的粒度和數(shù)量對任務管理開銷和整體性能的影響,把Reduce階段的任務數(shù)量設成對系統(tǒng)性能最優(yōu)的值.實驗結(jié)果表明:相對于Phoenix++,MALK的性能最高可提升3.8倍(平均2.8倍);在Map和Reduce階段,MALK最多可節(jié)省95.2%和87.8%的存儲空間;MALK在Reduce階段還取得了更好的負載均衡,降低了L2和LLC Cache的缺失率.
[Abstract]:Memory request is one of the main bottlenecks that lead to the deterioration of MapReduce performance on shared storage systems, especially for applications that need to deal with a large number of keys and values. In order to solve this problem, a MapReduce parallel computing framework, MALK (high-efficient MapReduce for applications having large amount of keys). MALK), is proposed, which has low memory overhead and can deal with large scale key values efficiently. MALK (high-efficient MapReduce for applications having large amount of keys). MALK) uses a continuous storage management method for discrete large scale key values. Avoid a large number of small blocks of memory applications; In order to reduce the amount of data active in the running process of the system by handling the tasks in the Map phase and the pipelined Reduce phase in a finer granularity, the requirements of the application program for memory are kept within a relatively small range. A reuse mechanism of Hash table is proposed to avoid the repeated request of Hash table memory in pipeline process by multiplexing the storage space of Hash table. MALK also considers the effects of the granularity and number of tasks on the task management overhead and overall performance comprehensively, and sets the number of tasks in the Reduce phase as the optimal value for system performance. The experimental results show that compared with Phoenix, MALK, the performance of MALK can be increased by 3.8 times (average 2.8 times), and the storage space of MALK can be saved by 95.2% and 87.8% at the stage of Map and Reduce. MALK also achieved better load balance in the Reduce phase, reducing the missing rate of L2 and LLC Cache.
【作者單位】: 計算機體系結(jié)構(gòu)國家重點實驗室(中國科學院計算技術研究所);中國科學院大學;首都師范大學信息工程學院;
【基金】:國家“九七三”重點基礎研究發(fā)展計劃基金項目(2011CB302501) 國家杰出青年科學基金項目(60925009) 國家自然科學基金項目(60921002,61173007,61100013,61100015,61202059,61202055) 國家“八六三”高技術研究發(fā)展計劃基金項目(2012AA012301,2012AA010303) 北京市科技新星計劃基金項目(2010B058) 計算機體系結(jié)構(gòu)國家重點實驗室開放課題(CARCH201203)
【分類號】:TP333
[Abstract]:Memory request is one of the main bottlenecks that lead to the deterioration of MapReduce performance on shared storage systems, especially for applications that need to deal with a large number of keys and values. In order to solve this problem, a MapReduce parallel computing framework, MALK (high-efficient MapReduce for applications having large amount of keys). MALK), is proposed, which has low memory overhead and can deal with large scale key values efficiently. MALK (high-efficient MapReduce for applications having large amount of keys). MALK) uses a continuous storage management method for discrete large scale key values. Avoid a large number of small blocks of memory applications; In order to reduce the amount of data active in the running process of the system by handling the tasks in the Map phase and the pipelined Reduce phase in a finer granularity, the requirements of the application program for memory are kept within a relatively small range. A reuse mechanism of Hash table is proposed to avoid the repeated request of Hash table memory in pipeline process by multiplexing the storage space of Hash table. MALK also considers the effects of the granularity and number of tasks on the task management overhead and overall performance comprehensively, and sets the number of tasks in the Reduce phase as the optimal value for system performance. The experimental results show that compared with Phoenix, MALK, the performance of MALK can be increased by 3.8 times (average 2.8 times), and the storage space of MALK can be saved by 95.2% and 87.8% at the stage of Map and Reduce. MALK also achieved better load balance in the Reduce phase, reducing the missing rate of L2 and LLC Cache.
【作者單位】: 計算機體系結(jié)構(gòu)國家重點實驗室(中國科學院計算技術研究所);中國科學院大學;首都師范大學信息工程學院;
【基金】:國家“九七三”重點基礎研究發(fā)展計劃基金項目(2011CB302501) 國家杰出青年科學基金項目(60925009) 國家自然科學基金項目(60921002,61173007,61100013,61100015,61202059,61202055) 國家“八六三”高技術研究發(fā)展計劃基金項目(2012AA012301,2012AA010303) 北京市科技新星計劃基金項目(2010B058) 計算機體系結(jié)構(gòu)國家重點實驗室開放課題(CARCH201203)
【分類號】:TP333
【參考文獻】
相關期刊論文 前4條
1 張書彬;韓冀中;劉志勇;王凱;;基于MapReduce實現(xiàn)空間查詢的研究[J];高技術通訊;2010年07期
2 王珊;王會舉;覃雄派;周p,
本文編號:2321597
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2321597.html
最近更新
教材專著