基于復(fù)雜網(wǎng)絡(luò)的網(wǎng)絡(luò)大數(shù)據(jù)聚類研究
本文選題:大數(shù)據(jù) 切入點(diǎn):復(fù)雜網(wǎng)絡(luò) 出處:《蘭州交通大學(xué)》2017年碩士論文 論文類型:學(xué)位論文
【摘要】:隨著通訊科技和IT技術(shù)的飛速發(fā)展,網(wǎng)絡(luò)規(guī)模不斷地擴(kuò)大及結(jié)構(gòu)逐漸的復(fù)雜,使得網(wǎng)絡(luò)產(chǎn)生海量信息數(shù)據(jù),即大數(shù)據(jù)(Big Data)。大數(shù)據(jù)的出現(xiàn)使得人類社會從信息時代過渡到大數(shù)據(jù)時代。在大數(shù)據(jù)時代,網(wǎng)絡(luò)數(shù)據(jù)表現(xiàn)出復(fù)雜性、多樣性以及異質(zhì)性等特征。在真實(shí)網(wǎng)絡(luò)中,社區(qū)結(jié)構(gòu)(又稱聚類特性)是復(fù)雜網(wǎng)絡(luò)大數(shù)據(jù)的重要特征,即社區(qū)內(nèi)部連接比較緊密,社區(qū)之間連接比較稀疏。社區(qū)結(jié)構(gòu)是分析網(wǎng)絡(luò)大數(shù)據(jù)的關(guān)鍵與基礎(chǔ),具有重要的研究價值和科學(xué)意義。目前社區(qū)發(fā)現(xiàn)已經(jīng)成為數(shù)據(jù)挖掘等眾多領(lǐng)域最具挑戰(zhàn)性的研究課題之一。本文主要圍繞同質(zhì)網(wǎng)絡(luò)和異質(zhì)網(wǎng)絡(luò)社區(qū)發(fā)現(xiàn)算法進(jìn)行研究,主要包括以下幾個方面的內(nèi)容:(1)為了能夠有效地挖掘復(fù)雜網(wǎng)絡(luò)中的重疊社區(qū)結(jié)構(gòu),本文提出一種基于極大團(tuán)連接相似性的重疊社區(qū)發(fā)現(xiàn)算法。該算法引入極大團(tuán)思想來初始化網(wǎng)絡(luò)的社區(qū)結(jié)構(gòu),并根據(jù)團(tuán)間的共享鄰居節(jié)點(diǎn)和團(tuán)間橋接邊對社區(qū)間的連接性進(jìn)行量化處理,以此為依據(jù)合并網(wǎng)絡(luò)中的社區(qū),得到較為合理的重疊社區(qū)結(jié)構(gòu)。將該算法與經(jīng)典的CPM算法在四個真實(shí)網(wǎng)絡(luò)上進(jìn)行對比實(shí)驗(yàn),實(shí)驗(yàn)結(jié)果表明,通過本文算法得到的網(wǎng)絡(luò)社區(qū)結(jié)構(gòu)在精確率、覆蓋率和模塊度等方面有所提高,證明該算法發(fā)現(xiàn)的重疊社區(qū)結(jié)構(gòu)較為合理。(2)針對傳統(tǒng)的同質(zhì)網(wǎng)絡(luò)社區(qū)發(fā)現(xiàn)算法無法充分利用異質(zhì)信息的問題,本文提出一種基于語義路徑的異質(zhì)網(wǎng)絡(luò)社區(qū)發(fā)現(xiàn)算法,充分考慮網(wǎng)絡(luò)中異質(zhì)節(jié)點(diǎn)和邊所包含的信息。該算法首先通過FindPath方法選取語義路徑;然后提取出不同語義路徑下對象的相似性矩陣;最后提取不同語義路徑下的對象特征并進(jìn)行融合,采用K-Means算法得到最終的社區(qū)劃分結(jié)果。并在真實(shí)數(shù)據(jù)集上進(jìn)行實(shí)驗(yàn),實(shí)驗(yàn)結(jié)果表明該算法的有效性。(3)針對異質(zhì)網(wǎng)絡(luò)的社區(qū)發(fā)現(xiàn)算法中無法充分保留異質(zhì)網(wǎng)絡(luò)原始結(jié)構(gòu)及其信息,而且較少考慮異質(zhì)節(jié)點(diǎn)同屬一個社區(qū)的情況,本文提出一種基于二部極大團(tuán)的異質(zhì)網(wǎng)絡(luò)社區(qū)發(fā)現(xiàn)算法。該算法引入二部極大團(tuán)理論:首先,以關(guān)鍵節(jié)點(diǎn)所屬規(guī)模最大的二部極大團(tuán)作為初始社區(qū);然后,以量化的社區(qū)的鄰居節(jié)點(diǎn)與社區(qū)的相似性為依據(jù)對社區(qū)進(jìn)行擴(kuò)充;最后,劃分出合理的社區(qū)結(jié)構(gòu)。通過在人工異質(zhì)網(wǎng)絡(luò)和真實(shí)異質(zhì)網(wǎng)絡(luò)上進(jìn)行對比實(shí)驗(yàn)。實(shí)驗(yàn)結(jié)果表明:該算法所劃分的社區(qū)準(zhǔn)確率和模塊度都相對較高,證明了該算法能夠有效的發(fā)現(xiàn)異質(zhì)網(wǎng)絡(luò)社區(qū)結(jié)構(gòu)。
[Abstract]:With the rapid development of communication technology and IT technology, the network scale is expanding and the structure is gradually complex, which makes the network produce massive information data. That is, big data and Big data. The emergence of big data makes the human society transition from the information age to the big data era. In the age of big data, the data on the web show the characteristics of complexity, diversity and heterogeneity. In the real network, the Internet is characterized by its complexity, diversity and heterogeneity. Community structure (also called clustering characteristic) is an important feature of big data in complex network, that is, the connection within the community is relatively close, and the connection between the communities is relatively sparse, the community structure is the key and foundation of analyzing the network big data. Community discovery has become one of the most challenging research topics in many fields, such as data mining. Mainly including the following aspects: 1) in order to be able to effectively mine overlapping community structures in complex networks, In this paper, an overlapping community discovery algorithm based on the similarity of maximal cluster connection is proposed, which introduces the idea of maximal cluster to initialize the community structure of the network. According to the shared neighbor nodes and bridging edges between groups, the connectivity of the communities is quantified, which is based on the merging of communities in the network. A reasonable overlapping community structure is obtained. The comparison between this algorithm and the classical CPM algorithm is carried out on four real networks. The experimental results show that the network community structure obtained by this algorithm is accurate. The coverage and module degree have been improved, which proves that the overlapping community structure found by this algorithm is reasonable. (2) aiming at the problem that the traditional community discovery algorithm of homogeneous network can not make full use of heterogeneous information, In this paper, a semantic path-based heterogeneous network community discovery algorithm is proposed, which takes full account of the information contained in heterogeneous nodes and edges in the network. Firstly, the semantic path is selected by the FindPath method. Then, the similarity matrix of objects under different semantic paths is extracted. Finally, the features of objects in different semantic paths are extracted and fused. Finally, the final community partition results are obtained by using K-Means algorithm, and the experiments are carried out on real data sets. The experimental results show that the algorithm is effective. (3) the original structure and information of heterogeneous network can not be fully preserved in the community discovery algorithm of heterogeneous network, and less consideration is given to the case that heterogeneous nodes belong to the same community. In this paper, a community discovery algorithm for heterogeneous networks based on bipartite maximal clusters is proposed. The bipartite maximal cluster theory is introduced in this algorithm: firstly, the bipartite maximal cluster with the largest size belongs to the key nodes as the initial community; then, The community is expanded on the basis of the similarity between the neighborhood nodes and the community. Finally, The reasonable community structure is divided. The results of the experiments on artificial heterogeneous network and real heterogeneous network show that the accuracy and modularity of the proposed algorithm are relatively high. It is proved that the algorithm can effectively discover the community structure of heterogeneous networks.
【學(xué)位授予單位】:蘭州交通大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP311.13;O157.5
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 喬少杰;韓楠;張凱峰;鄒磊;王宏志;Louis Alberto GUTIERREZ;;復(fù)雜網(wǎng)絡(luò)大數(shù)據(jù)中重疊社區(qū)檢測算法[J];軟件學(xué)報;2017年03期
2 楊曉光;朱保平;;基于復(fù)雜網(wǎng)絡(luò)的社區(qū)發(fā)現(xiàn)算法[J];南京理工大學(xué)學(xué)報;2016年03期
3 吳奇;陳福才;黃瑞陽;常振超;;基于語義路徑的異質(zhì)網(wǎng)絡(luò)社區(qū)發(fā)現(xiàn)方法[J];電子學(xué)報;2016年06期
4 時小虎;馮國香;李牧;李瑛;吳春國;;基于密度峰值的重疊社區(qū)發(fā)現(xiàn)算法[J];吉林大學(xué)學(xué)報(工學(xué)版);2017年01期
5 沈桂蘭;賈彩燕;于劍;楊小平;;適用于大規(guī)模信息網(wǎng)絡(luò)的語義社區(qū)發(fā)現(xiàn)方法[J];計(jì)算機(jī)科學(xué)與探索;2017年04期
6 錢曉東;曹陽;;基于社區(qū)極大類發(fā)現(xiàn)的大數(shù)據(jù)并行聚類算法[J];南京理工大學(xué)學(xué)報;2016年01期
7 黃磊;支小莉;鄭圣安;;面向大數(shù)據(jù)應(yīng)用的多層次混合式并行方法[J];上海大學(xué)學(xué)報(自然科學(xué)版);2016年01期
8 蔣盛益;楊博泓;王連喜;;一種基于增量式譜聚類的動態(tài)社區(qū)自適應(yīng)發(fā)現(xiàn)算法[J];自動化學(xué)報;2015年12期
9 張嬙嬙;黃廷磊;張銀明;;基于聚類分析的二分網(wǎng)絡(luò)社區(qū)挖掘[J];計(jì)算機(jī)應(yīng)用;2015年12期
10 于海;趙玉麗;崔坤;朱志良;;一種基于交叉熵的社區(qū)發(fā)現(xiàn)算法[J];計(jì)算機(jī)學(xué)報;2015年08期
相關(guān)博士學(xué)位論文 前1條
1 王莉;基于動態(tài)虛擬語義社區(qū)的知識通信[D];太原理工大學(xué);2010年
相關(guān)碩士學(xué)位論文 前1條
1 解;復(fù)雜網(wǎng)絡(luò)的社團(tuán)結(jié)構(gòu)建模與分析[D];上海交通大學(xué);2007年
,本文編號:1615844
本文鏈接:http://sikaile.net/kejilunwen/yysx/1615844.html