基于Pregel-Like架構(gòu)的并行圖挖掘平臺(tái)研究與實(shí)現(xiàn)

發(fā)布時(shí)間：2019-03-02 10:07

【摘要】：圖作為非結(jié)構(gòu)化數(shù)據(jù)中的一種重要的類型,比線性表和樹結(jié)構(gòu)在語義和結(jié)構(gòu)方面都有更強(qiáng)的表示能力。很多現(xiàn)實(shí)世界中的問題都可以用圖來表示,對(duì)圖數(shù)據(jù)的處理以及相關(guān)的應(yīng)用無處不在。而隨著信息爆炸型增長及社會(huì)網(wǎng)絡(luò)的大力發(fā)展,需要處理的圖數(shù)據(jù)的規(guī)模與日俱增,這對(duì)大規(guī)模圖的高效處理提出了巨大挑戰(zhàn)。云計(jì)算是處理海量數(shù)據(jù)挖掘任務(wù)、提升海量數(shù)據(jù)挖掘能力的有效手段之一,本文正是以這個(gè)思路為出發(fā)點(diǎn),設(shè)計(jì)并實(shí)現(xiàn)了一個(gè)基于Pregel-Like架構(gòu)的并行圖挖掘平臺(tái)。底層云存儲(chǔ)架構(gòu)選擇HDFS(Hadoop Distributed File System),可以為大規(guī)模圖的存儲(chǔ)提供高效、安全、高容錯(cuò)的支持。云計(jì)算框架使用BSP(Bulk Synchronous Parallel model)模型,其特有的超級(jí)步非常適合需要迭代計(jì)算的圖算法,使得對(duì)大規(guī)模圖數(shù)據(jù)的分析成為可能。平臺(tái)集成了各類圖挖掘算法,并對(duì)各算法分析結(jié)果及網(wǎng)絡(luò)拓?fù)浣Y(jié)構(gòu)提供了可視化展示。本文的主要工作是通過對(duì)經(jīng)典圖挖掘算法及相關(guān)改進(jìn)算法的調(diào)研,在底層框架的基礎(chǔ)上設(shè)計(jì)并實(shí)現(xiàn)了圖屬性分析和圖排序中對(duì)經(jīng)典重要算法的改進(jìn)算法,并結(jié)合實(shí)際數(shù)據(jù)集對(duì)經(jīng)典算法及改進(jìn)算法的結(jié)果做了對(duì)比,最后提出了一種新的圖聚類算法。具體內(nèi)容如下:1)設(shè)計(jì)并實(shí)現(xiàn)了 K-shell分解和半局部中心性兩種圖屬性并行算法,兩者相比于度值可以更好地刻畫節(jié)點(diǎn)重要性,計(jì)算復(fù)雜度又要比介數(shù)的計(jì)算低,基于公共數(shù)據(jù)集分析了兩者的分布及與度的相關(guān)分布。2)設(shè)計(jì)并實(shí)現(xiàn)了 LeaderRank 和 SALSA(Stochastic Approach for Link Structure Analysis)兩種圖排序算法,前者相比于PageRank算法收斂更快,能夠更好地識(shí)別有影響力的節(jié)點(diǎn),且抗干擾能力強(qiáng)。后者相比于 HITS(Hypertext-Induced Topic Search)算法,其 hub 值能更好地衡量節(jié)點(diǎn)傳播能力。3)提出了一種基于K-shell值的社團(tuán)發(fā)現(xiàn)算法,通過K-shell分解縮小網(wǎng)絡(luò)規(guī)模提高運(yùn)行效率,根據(jù)“基準(zhǔn)”網(wǎng)絡(luò)驗(yàn)證了正確性。
[Abstract]:As an important type of unstructured data, graphs have better semantic and structural representation than linear tables and tree structures. Many real-world problems can be represented by graphs, and the processing of graph data and related applications are everywhere. With the explosive growth of information and the vigorous development of social network, the scale of graph data that needs to be processed is increasing day by day, which poses a great challenge to the efficient processing of large-scale graph. Cloud computing is one of the effective means to deal with the task of massive data mining and to improve the capability of massive data mining. This paper designs and implements a parallel graph mining platform based on Pregel-Like architecture with this idea as the starting point. The underlying cloud storage architecture, HDFS (Hadoop Distributed File System), can provide efficient, secure, and fault-tolerant support for large-scale graph storage. Cloud computing framework uses BSP (Bulk Synchronous Parallel model) model, and its unique super-step is very suitable for graph algorithm which needs iterative computation, which makes it possible to analyze large-scale graph data. The platform integrates all kinds of graph mining algorithms, and provides a visual display of the analysis results and network topology of each algorithm. The main work of this paper is to design and implement an improved algorithm of classical important algorithm in graph attribute analysis and graph sorting based on the investigation of classical graph mining algorithm and related improved algorithm. The results of the classical algorithm and the improved algorithm are compared with the actual data set. Finally, a new graph clustering algorithm is proposed. The main contents are as follows: 1) the parallel algorithms of K-shell decomposition and semi-local centrality are designed and implemented. Compared with the values, the two algorithms can better describe the importance of nodes, and the computational complexity is lower than that of the meshwork. Based on the common data set, we analyze the distribution of the two and the correlation distribution of degree. 2) We design and implement two kinds of graph sorting algorithms: LeaderRank and SALSA (Stochastic Approach for Link Structure Analysis), the former converges faster than the PageRank algorithm, and the former converges faster than the PageRank algorithm. Can better identify influential nodes, and strong anti-interference ability. Compared with HITS (Hypertext-Induced Topic Search), the hub value of the latter can better measure the propagation ability of nodes. 3) A community discovery algorithm based on K-shell value is proposed, which reduces the size of the network and improves the efficiency of operation by K-shell decomposition. The correctness is verified according to the "benchmark" network.
【學(xué)位授予單位】：北京郵電大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2016
【分類號(hào)】：TP311.13

【相似文獻(xiàn)】

相關(guān)期刊論文前10條

1 馬安光;;棋子問題的算法分析——2003年第11期題解[J];程序員;2004年01期

2 馮舜璽;;新書推薦:《算法分析導(dǎo)論》[J];計(jì)算機(jī)教育;2006年05期

3 張力,慕曉冬;計(jì)算機(jī)算法分析淺談[J];武警工程學(xué)院學(xué)報(bào);2002年04期

4 馬安光;;飛彈問題的算法分析——2003年第10期題解[J];程序員;2003年12期

5 蘇運(yùn)霖;;《算法分析導(dǎo)論》評(píng)介[J];計(jì)算機(jī)教育;2006年07期

6 朱力強(qiáng);;培養(yǎng)學(xué)生創(chuàng)新思維與能力的算法分析案例[J];計(jì)算機(jī)與信息技術(shù);2007年11期

7 汪菊琴;;幾種常見特殊方陣的算法分析與實(shí)現(xiàn)[J];無錫職業(yè)技術(shù)學(xué)院學(xué)報(bào);2009年05期

8 李涵;;“算法分析與設(shè)計(jì)”課程教學(xué)改革和實(shí)踐[J];中國電力教育;2010年16期

9 劉寧;管濤;;淺析案例教學(xué)法在算法分析與設(shè)計(jì)課程中的應(yīng)用[J];科技風(fēng);2011年07期

10 胡峰;王國胤;;“算法分析與設(shè)計(jì)”教學(xué)模式探索[J];當(dāng)代教育理論與實(shí)踐;2011年12期

相關(guān)會(huì)議論文前6條

1 陳洪陶;王生原;武繼剛;朱智林;楊慶德;;BSP在分布式數(shù)據(jù)庫系統(tǒng)規(guī)范中的應(yīng)用[A];第九屆全國數(shù)據(jù)庫學(xué)術(shù)會(huì)議論文集(上)[C];1990年

2 俞洋;田亞菲;;一種新的變步長LMS算法及其仿真[A];通信理論與信號(hào)處理新進(jìn)展——2005年通信理論與信號(hào)處理年會(huì)論文集[C];2005年

3 周顥;劉振華;趙保華;;構(gòu)造型的D~2FA生成算法[A];中國通信學(xué)會(huì)通信軟件技術(shù)委員會(huì)2009年學(xué)術(shù)會(huì)議論文集[C];2009年

4 賴桃桃;馮少榮;張東站;;一種基于劃分和密度的快速聚類算法[A];第二十五屆中國數(shù)據(jù)庫學(xué)術(shù)會(huì)議論文集（一）[C];2008年

5 劉遠(yuǎn)新;鄧飛其;羅艷輝;舒添慧;;ERP柔性平臺(tái)下物流運(yùn)輸配送系統(tǒng)算法分析[A];第二十六屆中國控制會(huì)議論文集[C];2007年

6 王樹西;白碩;姜吉發(fā);;模式合一的“減首去尾”算法[A];第二屆全國學(xué)生計(jì)算語言學(xué)研討會(huì)論文集[C];2004年

相關(guān)博士學(xué)位論文前10條

1 魏哲學(xué);樣本斷點(diǎn)距離問題的算法與復(fù)雜性研究[D];山東大學(xué);2015年

2 劉春明;基于增強(qiáng)學(xué)習(xí)和車輛動(dòng)力學(xué)的高速公路自主駕駛研究[D];國防科學(xué)技術(shù)大學(xué);2014年

3 張敏霞;生物地理學(xué)優(yōu)化算法及其在應(yīng)急交通規(guī)劃中的應(yīng)用研究[D];浙江工業(yè)大學(xué);2015年

4 李紅;流程挖掘算法研究[D];云南大學(xué);2015年

5 卜晨陽;演化約束優(yōu)化及演化動(dòng)態(tài)優(yōu)化求解算法研究[D];中國科學(xué)技術(shù)大學(xué);2017年

6 陳拉明;基于非凸優(yōu)化的稀疏重建理論與算法[D];清華大學(xué);2016年

7 劉新旺;多核學(xué)習(xí)算法研究[D];國防科學(xué)技術(shù)大學(xué);2013年

8 于濱;城市公交系統(tǒng)模型與算法研究[D];大連理工大學(xué);2006年

9 曾國強(qiáng);改進(jìn)的極值優(yōu)化算法及其在組合優(yōu)化問題中的應(yīng)用研究[D];浙江大學(xué);2011年

10 肖永豪;蜂群算法及在圖像處理中的應(yīng)用研究[D];華南理工大學(xué);2011年

相關(guān)碩士學(xué)位論文前10條

1 王陽;基于Pregel-Like架構(gòu)的并行圖挖掘平臺(tái)研究與實(shí)現(xiàn)[D];北京郵電大學(xué);2016年

2 薛婧杰;多元架構(gòu)領(lǐng)導(dǎo)力模型在A集團(tuán)中的應(yīng)用研究[D];中國地質(zhì)大學(xué)(北京);2017年

3 武雅卉;網(wǎng)絡(luò)架構(gòu)中柔性結(jié)構(gòu)的研究[D];大連理工大學(xué);2017年

4 李妮莎;關(guān)于珠江公司組織架構(gòu)優(yōu)化的研究[D];廣東外語外貿(mào)大學(xué);2017年

5 黃廈;基于改進(jìn)蟻群算法的柔性作業(yè)車間調(diào)度問題研究[D];昆明理工大學(xué);2015年

6 李平;基于Hadoop的信息爬取與輿情檢測算法研究[D];昆明理工大學(xué);2015年

7 趙官寶;基于位表的關(guān)聯(lián)規(guī)則挖掘算法研究[D];昆明理工大學(xué);2015年

8 殷文華;移動(dòng)容遲網(wǎng)絡(luò)中基于社會(huì)感知的多播分發(fā)算法研究[D];內(nèi)蒙古大學(xué);2015年

9 徐翔燕;人工魚群優(yōu)化算法及其應(yīng)用研究[D];西南交通大學(xué);2015年

10 李德福;基于小世界模型的啟發(fā)式尋路算法研究[D];華中師范大學(xué);2015年

，

本文編號(hào)：2432950

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2432950.html

上一篇：自適應(yīng)分?jǐn)?shù)階TV修復(fù)算法與研究
下一篇：基于圖像序列的運(yùn)動(dòng)目標(biāo)檢測與跟蹤研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級(jí)|國家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于Pregel-Like架構(gòu)的并行圖挖掘平臺(tái)研究與實(shí)現(xiàn)