天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁(yè) > 科技論文 > 軟件論文 >

基于Hadoop的社區(qū)發(fā)現(xiàn)算法并行化研究

發(fā)布時(shí)間:2018-04-22 08:04

  本文選題:社區(qū)發(fā)現(xiàn) + Hadoop; 參考:《江西理工大學(xué)》2017年碩士論文


【摘要】:社區(qū)發(fā)現(xiàn)算法是研究復(fù)雜網(wǎng)絡(luò)中社區(qū)結(jié)構(gòu)的主要方法。隨著網(wǎng)絡(luò)規(guī)模爆發(fā)式的增長(zhǎng),傳統(tǒng)單機(jī)、串行的社區(qū)發(fā)現(xiàn)算法已不適用于處理當(dāng)前大規(guī)模的網(wǎng)絡(luò)。Hadoop作為新興的一種大數(shù)據(jù)處理技術(shù),因其高擴(kuò)展、高可靠、編程模型簡(jiǎn)單受到許多開(kāi)發(fā)者的青睞。針對(duì)當(dāng)前串行社區(qū)發(fā)現(xiàn)算法處理網(wǎng)絡(luò)規(guī)模有限問(wèn)題,本文結(jié)合Hadoop框架在大數(shù)據(jù)處理方面的優(yōu)勢(shì)提出兩種串行社區(qū)發(fā)現(xiàn)算法并行化改造方案。針對(duì)Fast-Newman算法計(jì)算節(jié)點(diǎn)模塊度復(fù)雜度比較高問(wèn)題,本文提出基于Hadoop框架調(diào)度的Fast-Newman并行算法。并行化Fast-Newman算法存在的主要難點(diǎn)為在map函數(shù)中各節(jié)點(diǎn)無(wú)法獲取其鄰居節(jié)點(diǎn)的信息,故還需借助Web服務(wù)提供全局圖信息。改進(jìn)后的Fast-Newman算法將在map函數(shù)中并行的計(jì)算每個(gè)節(jié)點(diǎn)與其鄰居節(jié)點(diǎn)合并后的模塊度增量,在reduce函數(shù)中匯總找出模塊度增量最大的兩個(gè)節(jié)點(diǎn)并將該兩節(jié)點(diǎn)合并,此為一次合并過(guò)程。合并后的結(jié)果重新作為map函數(shù)的輸入,迭代執(zhí)行map、reduce過(guò)程直到所有節(jié)點(diǎn)合并成一個(gè)社區(qū)。采用Ego-Facebook作為數(shù)據(jù)集,在仿真環(huán)境下實(shí)驗(yàn)結(jié)果表明并行化后的Fast-Newman算法具有較高的加速比。針對(duì)處理大規(guī)模網(wǎng)絡(luò)難問(wèn)題,本文提出基于Hadoop的Fast-Unfolding并行化算法PFU(Parallel-Fast-Unfolding)。該算法主要采用“分而治之”的思想,首先將大規(guī)模網(wǎng)絡(luò)分區(qū)并各自合并,然后根據(jù)各分區(qū)合并結(jié)果重構(gòu)網(wǎng)絡(luò),最后迭代合并重構(gòu)網(wǎng)絡(luò)直到社區(qū)結(jié)構(gòu)不再發(fā)生變化。該并行化方案存在兩個(gè)難點(diǎn):一是如何保證分區(qū)后邊連接信息不會(huì)丟失;二是分區(qū)完成后如何重構(gòu)網(wǎng)絡(luò)。針對(duì)上述兩個(gè)難點(diǎn),本文通過(guò)改進(jìn)數(shù)據(jù)存儲(chǔ)方式以及設(shè)計(jì)重構(gòu)方案有效地解決了該問(wèn)題。在真實(shí)網(wǎng)絡(luò)和生成網(wǎng)絡(luò)兩種數(shù)據(jù)集上實(shí)驗(yàn)結(jié)果表明,PFU算法在保證準(zhǔn)確率的基礎(chǔ)上明顯的提高了算法運(yùn)行的效率,具有較好的擴(kuò)展性。最后,根據(jù)map、reduce階段輸出的中間文件,用gephi軟件對(duì)結(jié)果進(jìn)行可視化提高了PFU算法的應(yīng)用價(jià)值。
[Abstract]:Community discovery algorithm is the main method to study the community structure in complex networks. With the explosive growth of network scale, the traditional single-machine, serial community discovery algorithm is no longer suitable to deal with the current large-scale network. Hadoop as a new big data processing technology, because of its high expansion, high reliability, Simple programming models are popular with many developers. In view of the limited network size of the current serial community discovery algorithm, this paper proposes two parallel transformation schemes of serial community discovery algorithm based on the advantages of Hadoop framework in big data processing. Aiming at the high complexity of computing node modules in Fast-Newman algorithm, a Fast-Newman parallel algorithm based on Hadoop framework scheduling is proposed in this paper. The main difficulty of parallelization Fast-Newman algorithm is that each node can not get the information of its neighbor nodes in the map function, so it is necessary to provide global graph information with the help of Web service. The improved Fast-Newman algorithm will calculate the modular degree increment of each node and its neighbor nodes in parallel in map function, and find out the two nodes with the largest module degree increment in the reduce function and merge the two nodes together, which is a merging process. The merged results are reused as input to the map function, iterating through the map-reduce process until all nodes are merged into one community. Using Ego-Facebook as data set, the simulation results show that the parallel Fast-Newman algorithm has a high speedup. In order to deal with the problem of large scale network, this paper proposes a parallel Fast-Unfolding algorithm based on Hadoop, PFUU Parallel-Fast-Unfolding algorithm (PFU / Parallel-Fast-Unfolding). The algorithm mainly adopts the idea of "divide and conquer". Firstly, the large-scale network is partitioned and merged separately, then the network is reconstructed according to the results of each partition merging. Finally, the network is merged and reconstructed iteratively until the community structure is no longer changed. There are two difficulties in the parallelization scheme: one is how to ensure that the connection information behind the partition will not be lost; the other is how to reconstruct the network after the partition is completed. In view of the above two difficulties, this paper solves the problem effectively by improving the data storage method and designing the refactoring scheme. Experimental results on two data sets of real network and generated network show that the PFU algorithm can obviously improve the efficiency and scalability of the algorithm on the basis of ensuring the accuracy. Finally, according to the intermediate file output from map reduce phase, the application value of PFU algorithm is improved by using gephi software to visualize the result.
【學(xué)位授予單位】:江西理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP311.13;O157.5

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 吳衛(wèi)江;李沐南;李國(guó)和;;Louvain算法的并行化處理[J];計(jì)算機(jī)與數(shù)字工程;2016年08期

2 陳羽中;施松;朱偉平;於志勇;郭昆;;一種基于鄰域跟隨關(guān)系的增量社區(qū)發(fā)現(xiàn)算法[J];計(jì)算機(jī)學(xué)報(bào);2017年03期

3 王昊宇;吳斌;;基于MapReduce的二分圖社團(tuán)發(fā)現(xiàn)[J];計(jì)算機(jī)應(yīng)用與軟件;2015年06期

4 孫霞;張敏超;馮筠;張蕾;何緋娟;;Hadoop框架下的多標(biāo)簽傳播算法[J];西安交通大學(xué)學(xué)報(bào);2015年05期

5 辛宇;楊靜;謝志強(qiáng);;基于隨機(jī)游走的語(yǔ)義重疊社區(qū)發(fā)現(xiàn)算法[J];計(jì)算機(jī)研究與發(fā)展;2015年02期

6 畢娟;秦志光;;基于概率主題模型的社交網(wǎng)絡(luò)層次化社區(qū)發(fā)現(xiàn)算法[J];電子科技大學(xué)學(xué)報(bào);2014年06期

7 和亮;馮登國(guó);蘇璞睿;應(yīng)凌云;楊軼;;基于社團(tuán)并行發(fā)現(xiàn)的在線社交網(wǎng)絡(luò)蠕蟲抑制[J];計(jì)算機(jī)學(xué)報(bào);2015年04期

8 馬千里;張俊浩;;一種局部強(qiáng)化的多標(biāo)簽傳播社區(qū)發(fā)現(xiàn)算法[J];計(jì)算機(jī)工程;2014年06期

9 武森;盧丹;馮小東;杜彥南;;基于大規(guī)模復(fù)雜網(wǎng)絡(luò)社區(qū)發(fā)現(xiàn)的科研合著網(wǎng)絡(luò)分析[J];中國(guó)科技論文;2014年04期

10 李冠辰;;一個(gè)基于hadoop的并行社交網(wǎng)絡(luò)挖掘系統(tǒng)[J];軟件;2013年12期



本文編號(hào):1786300

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1786300.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶15765***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com
99一级特黄色性生活片| 国产欧美一区二区久久| 中文字日产幕码三区国产| 视频一区二区三区自拍偷| 尹人大香蕉中文在线播放| 三级理论午夜福利在线看| 中文字幕一二区在线观看| 国产欧美高清精品一区| 亚洲欧美日韩国产成人| 精品人妻精品一区二区三区| 日韩精品一级一区二区| 日本一本不卡免费视频 | 69久久精品亚洲一区二区| 亚洲日本久久国产精品久久| 中文字幕人妻综合一区二区| 国产精品亚洲综合色区韩国 | 国产精品免费视频视频| 亚洲男人的天堂就去爱| 久久国产亚洲精品成人| 亚洲av又爽又色又色| 国产精品久久香蕉国产线| 国产精品香蕉在线的人| 少妇丰满a一区二区三区| 大香蕉伊人精品在线观看| 国产极品粉嫩尤物一区二区| 国产免费黄片一区二区| 国产精品午夜性色视频| 欧美一区二区三区喷汁尤物| 殴美女美女大码性淫生活在线播放| 亚洲伊人久久精品国产| 国产欧美日产久久婷婷| 国产精品一区二区视频成人| a久久天堂国产毛片精品| 国产精品视频一级香蕉| 久久福利视频视频一区二区| 国产成人国产精品国产三级| 亚洲国产四季欧美一区| 91精品国产品国语在线不卡| 日韩欧美一区二区黄色| 91欧美日韩国产在线观看| 日韩精品一级一区二区|