基于Spark的BIRCH算法并行化的設(shè)計(jì)與實(shí)現(xiàn)
發(fā)布時(shí)間:2018-06-26 19:41
本文選題:Spark + BIRCH并行化。 參考:《計(jì)算機(jī)工程與科學(xué)》2017年01期
【摘要】:在分布式計(jì)算和內(nèi)存為王的時(shí)代,Spark作為基于內(nèi)存計(jì)算的分布式框架技術(shù)得到了前所未有的關(guān)注與應(yīng)用。著重研究BIRCH算法在Spark上并行化的設(shè)計(jì)和實(shí)現(xiàn),經(jīng)過(guò)理論性能分析得到并行化過(guò)程中時(shí)間消耗較多的Spark轉(zhuǎn)化操作,同時(shí)根據(jù)并行化BIRCH算法的有向無(wú)環(huán)圖DAG,減少shuffle和磁盤讀寫頻率,以期達(dá)到性能優(yōu)化。最后,將并行化后的BIRCH算法分別與單機(jī)的BIRCH算法和MLlib中的K-Means聚類算法做了性能對(duì)比實(shí)驗(yàn)。實(shí)驗(yàn)結(jié)果表明,通過(guò)Spark對(duì)BIRCH算法并行化,其聚類質(zhì)量沒(méi)有明顯的損失,并且獲得了比較理想的運(yùn)行時(shí)間和加速比。
[Abstract]:In the era of distributed computing and memory being king, Spark, as a distributed framework based on memory computing, has received unprecedented attention and application. This paper focuses on the design and implementation of parallelization of Birch algorithm on Spark. Through theoretical performance analysis, the Spark conversion operation, which consumes more time in the process of parallelization, is obtained. At the same time, according to the directed acyclic graph DAG of parallelized Birch algorithm, the frequency of shuffle and disk reading and writing is reduced. In order to achieve performance optimization. Finally, the parallel Birch algorithm is compared with the single-machine Birch algorithm and the K-Means clustering algorithm in MLlib. The experimental results show that the BIRCH algorithm is parallelized by Spark without obvious loss of clustering quality, and the ideal running time and speedup ratio are obtained.
【作者單位】: 北京郵電大學(xué)智能通信軟件與多媒體北京重點(diǎn)實(shí)驗(yàn)室;北京郵電大學(xué)計(jì)算機(jī)學(xué)院;國(guó)網(wǎng)山東省電力公司電力科學(xué)研究院;
【基金】:國(guó)家863計(jì)劃(2015AA050204) 國(guó)網(wǎng)科技項(xiàng)目(60873120)
【分類號(hào)】:TP311.13
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 吳正娟;職為梅;楊勇;范明;;并行化的粒子群技術(shù)[J];微計(jì)算機(jī)信息;2009年36期
2 齊書陽(yáng);;迎接并行化的明天[J];軟件世界;2009年06期
3 曹琳,楊學(xué)軍,金國(guó)華;兩種并行化機(jī)制的分析[J];計(jì)算機(jī)研究與發(fā)展;1993年09期
4 金國(guó)華,,陳福接;并行化技術(shù)與工具[J];計(jì)算機(jī)研究與發(fā)展;1996年07期
5 蔡立志,童維勤,廖文昭;序列拼裝程序的并行化研究與實(shí)現(xiàn)[J];計(jì)算機(jī)工程與應(yīng)用;2003年14期
6 王偉;潘建偉;;有限差分法的并行化計(jì)算實(shí)現(xiàn)[J];電腦知識(shí)與技術(shù);2008年07期
7 程錦松;;迭代法的并行化[J];安徽大學(xué)學(xué)報(bào)(自然科學(xué)版);1997年03期
8 陳再高;王s
本文編號(hào):2071189
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2071189.html
最近更新
教材專著