天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當前位置:主頁 > 科技論文 > 軟件論文 >

圖挖掘算法的研究與應用

發(fā)布時間:2018-04-14 00:16

  本文選題:圖挖掘 + 極大團 ; 參考:《北京郵電大學》2016年碩士論文


【摘要】:近年來,隨著社交網(wǎng)絡的興起,圖挖掘越來越受到研究人員的重視,有關圖挖掘研究的論文在SigKDD、ICDM、SiamDM等會議中有逐年遞增的趨勢。極大團挖掘是圖挖掘中的一個基本問題,有著豐富的研究意義與極高的實踐價值。如進行社交網(wǎng)絡分析、利用郵件網(wǎng)絡推斷社會等級、研究行為與認知網(wǎng)絡的結構、對金融網(wǎng)絡進行統(tǒng)計分析、對動態(tài)網(wǎng)絡進行聚類、偵測各種恐怖活動和緊急事件等。本論文以極大團算法的研究為切入點,對這一基本的圖挖掘問題進行了理論上的研究并給出了其實踐中的應用。論文的研究主要分為三大部分:1.研究并實現(xiàn)了 Base BK算法、Improved BK系列算法和Kose系列算法,對部分算法進行對比實驗,分析算法的適用場景與時空優(yōu)劣。首先分析Base BK算法,這是最早的一個使用遞歸方式挖掘極大團的經(jīng)典算法。進而研究Improved BK系列算法,這些算法致力于解決Base BK算法遞歸空間樹規(guī)模過于龐大的問題,提出了各種啟發(fā)式選擇標志節(jié)點的方法對Base BK算法的遞歸空間樹進行剪枝。其中著名的是最小計數(shù)器版本的Improved BK算法和隨機CANDIDATESNOT版本的Improved BK算法,論文都有理論分析與實現(xiàn)。也研究了 Kose系列算法,Kose算法使用迭代方式挖掘圖中的極大團,雖然避免了生成不必要的子團,卻導致內存消耗急劇增大。由此衍生了 Kose RAM算法和Kose Disk算法,它們分別將中間數(shù)據(jù)存入內存和硬盤,所以耗費的內存不同。本論文研究并實現(xiàn)了 Kose RAM算法。最后取七個經(jīng)典的極大團挖掘算法,在單進程模式下實現(xiàn),在相同的實驗環(huán)境下做了 1000多個實驗,根據(jù)實驗結果反饋分析算法的原理并給出了不同環(huán)境下如何選擇極大團挖掘算法的建議。2.研究FAMCELM算法,指出使其運行失效的場景,進而提出受限內存下的解決方案,并以理論和實驗證明算法的可行性。Fast Algorithms for Maximal Clique Enumeration with Limited Memory 論文提出了在大規(guī)模圖中挖掘極大團一個分布式的算法。這一算法有著三大優(yōu)勢:適用于內存有限的場景、減少了挖掘極大團的機器的CPU消耗、可以在并行環(huán)境下實現(xiàn)。研究中發(fā)現(xiàn),面對圖中存在度極大的節(jié)點的情形,論文中的SeqMCE算法會失效。分析了產(chǎn)生這個問題的原因,給出了解決這一問題的一個方案及嚴格的理論證明。在對SeqMCE做了其他的優(yōu)化后提出了 Improved SeqMCE算法,這一算法解決了遇到度極大的節(jié)點的問題,代價則是重復挖掘了部分全局極大團。最后在內存受限的場景下運行Improved SeqMCE算法,從而證明其在實踐中的可行性。3.依托于Improved BK算法和Improved SeqMCE算法,為解決統(tǒng)計段時間內不同的發(fā)送垃圾短信的手機號碼總數(shù)的問題,提出了基于先驗概率的Hash算法——GHash算法。作為算法的預處理部分,首先需挖掘出等長字符串(場景下則是手機號碼)的統(tǒng)計規(guī)律。論文將字符串的各個位置抽象成一個圖的節(jié)點,創(chuàng)造性地將挖掘統(tǒng)計規(guī)律的問題轉化成尋找極長鏈的問題,進而轉化成在其傳遞閉包中挖掘極大團的問題。在使用Improved BK算法挖掘出圖中的極大團后,再將節(jié)點映射回原字符串的相應位置,從而得到原字符串的統(tǒng)計規(guī)律。為解決極大團挖掘算法為NP問題,算法耗時過久的問題,又提出了通過計算信息熵從而得到統(tǒng)計規(guī)律的一個子集的替代方案,這在手機號碼的場景下運行良好。在一個有著3000多萬手機號碼的數(shù)據(jù)集上的實驗表明,GHash算法所占用的內存遠小于使用Bitmap算法所占用的內存;速度上則遠快于使用AVL樹、紅黑樹和Trie樹完成檢索的場景;相比較布隆過濾器,GHash算法是1000%準確的;相比較傳統(tǒng)的部分Hash算法,如djb2、fnv-1、sdbm等,GHash算法無論是運行時間還是沖突數(shù),都有明顯的優(yōu)勢。
[Abstract]:In recent years, with the rise of social networks, graph mining researchers pay more attention to the related research, graph mining papers in SigKDD, ICDM, SiamDM etc. is increasing meeting. The maximum clique mining is a basic problem in graph mining, has abundant research significance and high practical value. Such as social network analysis, using e-mail network to infer the social hierarchy, structure of behavioral and cognitive networks, statistical analysis of the financial network, clustering of dynamic network, detection of various terrorist activities and emergencies. This thesis studies the maximum clique algorithm as the starting point, mining of the basic map the problem was presented on the theory research and its application in practice. The thesis is divided into three parts: 1. the research and implementation of Base BK algorithm, Improved algorithm and BK series Kose series of algorithm. Some algorithms are compared, the suitable scene analysis algorithm and spatial analysis of Base BK algorithm performance. First, this is a classic algorithm using the recursive method the earliest mining maximum. Then study the Improved series of BK algorithm, the algorithm to solve Base BK algorithm recursive tree space scale is too large, the the various heuristic node selection marker method of recursive space tree of Base BK algorithm for pruning. The famous Improved BK algorithm Improved BK algorithm and random version of the CANDIDATESNOT version of the minimum counter, the theory and implementation of the theoretical analysis. Also study the series of Kose algorithm, Kose algorithm using iterative method, mining maximal clique graph although, to avoid generating unnecessary sub group, but lead to memory consumption increases rapidly. The Kose derived from the RAM algorithm and Kose Disk algorithm, respectively. The intermediate data stored in memory and hard disk, so the cost of memory. The research and implementation of Kose RAM algorithm. The maximum clique finally take seven classical mining algorithm, implemented in a single process mode, more than 1000 experiments were done in the same experimental environment, according to the proposed.2. FAMCELM algorithm research results feedback analysis of the principle of the algorithm is given under different circumstances, how to choose the maximum clique algorithm, the operation failure of the scene, and then propose solutions under the limited memory, and by theoretical analysis and experimental results show the feasibility of the algorithm.Fast Algorithms for Maximal Clique Enumeration with Limited Memory is proposed in a large graph mining maximal clique a distributed algorithm. This algorithm has three major advantages: suitable for limited memory of the scene, reduce the mining maximal clique CPU machine consumption, can In the parallel environment. The study found that in the face of great presence of node degree in the case of SeqMCE algorithm in this paper will failure. Analysis of the causes of this problem, gives a solution to this problem and the strict theoretical proof. In the other optimization of SeqMCE is proposed Improved SeqMCE algorithm, this algorithm solves the problems encountered node degree greatly, it is repeated mining of partial global maximal clique. Finally, run the Improved SeqMCE algorithm in memory constrained scenarios, thus proving the feasibility in the practice of.3. based on Improved BK algorithm and Improved SeqMCE algorithm, for the total number of mobile phone number to send spam messages of different time. To solve the problem, Hash algorithm is proposed based on prior probability -- GHash algorithm. As a preprocessing algorithm, first of all need to dig out so long String (scenario is the mobile phone number) of the statistical law. This paper will abstract each position string nodes of a graph, creatively mining statistical regularity problem into finding the very long chain problem, and then transformed into mining maximal clique in the transitive closure of the problem. In the use of Improved BK algorithm the maximal cliques in graphs, then the corresponding node mapped back to the original string, so as to obtain the statistical regularity of the original string. In order to solve the maximum clique algorithm for NP problem, the algorithm is time-consuming too long, and put forward the alternative one by calculating the information entropy to obtain statistical rule sets, this running well in the mobile phone number of scenes. Show that in a about 30000000 mobile phone number of the data set on the experiment, the GHash algorithm is far less than the memory occupied by using the Bitmap algorithm the memory speed; Is far faster than the AVL tree, tree and Trie tree search scene; comparison of Bloom filter, GHash algorithm is 1000% accurate; part compared with the traditional Hash algorithm, such as djb2, fnv-1, sdbm, GHash algorithm both running time or the number of conflicts, have obvious advantages.

【學位授予單位】:北京郵電大學
【學位級別】:碩士
【學位授予年份】:2016
【分類號】:TP311.13

【參考文獻】

相關期刊論文 前8條

1 黃金;吳曉東;武紅斌;;哈希索引在交警專用移動執(zhí)法終端數(shù)據(jù)檢索中的應用研究[J];中國公共安全(學術版);2010年03期

2 肖波;徐前方;藺志青;郭軍;李春光;;可信關聯(lián)規(guī)則及其基于極大團的挖掘算法[J];軟件學報;2008年10期

3 辛向軍;李剛;董慶寬;肖國鎮(zhèn);;一個高效的隨機化的可驗證加密簽名方案[J];電子學報;2008年07期

4 李先通;李建中;高宏;;一種高效頻繁子圖挖掘算法[J];軟件學報;2007年10期

5 唐德權;夏耀穩(wěn);朱林立;夏幼明;;基于有向圖的關聯(lián)規(guī)則挖掘算法研究[J];云南大學學報(自然科學版);2006年S2期

6 馬根峰;哈希技術在廣東電信公話200話單處理中的應用[J];廣東通信技術;2003年07期

7 馬文平,王新梅;多發(fā)送認證碼的幾個新的構造方法[J];電子學報;2000年04期

8 陳瀅,王能斌;半結構化數(shù)據(jù)查詢的處理和優(yōu)化[J];軟件學報;1999年08期

,

本文編號:1746864

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1746864.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權申明:資料由用戶cbf69***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com