異構(gòu)社交網(wǎng)絡(luò)中社區(qū)發(fā)現(xiàn)算法研究
發(fā)布時(shí)間:2018-03-23 09:01
本文選題:社區(qū)發(fā)現(xiàn) 切入點(diǎn):異構(gòu)社交網(wǎng)絡(luò) 出處:《中國礦業(yè)大學(xué)(北京)》2016年博士論文 論文類型:學(xué)位論文
【摘要】:社會(huì)網(wǎng)絡(luò)中的社區(qū)發(fā)現(xiàn)作為數(shù)據(jù)挖掘研究領(lǐng)域的一個(gè)熱點(diǎn),近幾年發(fā)展迅速,研究內(nèi)容主要集中在通過對(duì)網(wǎng)絡(luò)中存在的關(guān)系進(jìn)行分析,得到社區(qū)劃分的結(jié)果。隨著Web2.0的興起和社交網(wǎng)絡(luò)的蓬勃發(fā)展,出現(xiàn)了多種新型的在線社交方式,單一的社會(huì)網(wǎng)絡(luò)關(guān)系結(jié)構(gòu)已經(jīng)不足以應(yīng)對(duì)解決現(xiàn)實(shí)世界的問題,所以學(xué)者們進(jìn)一步提出了異構(gòu)社交網(wǎng)絡(luò)(Heterogeneous Social Networks)的概念。這是一個(gè)復(fù)雜的網(wǎng)絡(luò)抽象結(jié)構(gòu),網(wǎng)絡(luò)中通常包含多種關(guān)系和實(shí)體,這些不同的關(guān)系和實(shí)體組合形成了網(wǎng)絡(luò)的多樣化復(fù)雜結(jié)構(gòu)。如何處理這些復(fù)雜的結(jié)構(gòu)和獲取社區(qū)結(jié)構(gòu)信息,是對(duì)傳統(tǒng)的社會(huì)網(wǎng)絡(luò)社區(qū)發(fā)現(xiàn)的一個(gè)新的挑戰(zhàn)。本文將針對(duì)劃分異構(gòu)網(wǎng)絡(luò)過程中,多維度、多維復(fù)雜關(guān)系、多類型節(jié)點(diǎn)等特性所帶來的網(wǎng)絡(luò)數(shù)據(jù)重構(gòu)與降維問題;傳統(tǒng)研究僅僅局限于圖鏈接關(guān)系,并未考慮語義信息,也就是主題對(duì)劃分社區(qū)的幫助作用;再者傳統(tǒng)劃分算法需要預(yù)先知識(shí)和預(yù)先設(shè)定社區(qū)個(gè)數(shù)來得到劃分結(jié)果,但真實(shí)世界社交網(wǎng)絡(luò)中的社區(qū)個(gè)數(shù)往往是不可知的,尤其在大規(guī)模的社交網(wǎng)絡(luò)中預(yù)先知識(shí)不可知等問題展開研究。主要包括異構(gòu)社交網(wǎng)絡(luò)通用分析框架,基于標(biāo)簽傳播近似線性的社區(qū)發(fā)現(xiàn)算法,基于主題感知的社區(qū)發(fā)現(xiàn)算法等,從而設(shè)計(jì)出高效快速的異構(gòu)網(wǎng)絡(luò)社區(qū)發(fā)現(xiàn)算法。研究內(nèi)容及創(chuàng)新點(diǎn)包括:1)提出了一種異構(gòu)社交網(wǎng)絡(luò)分析框架,針對(duì)異構(gòu)網(wǎng)絡(luò)進(jìn)行數(shù)據(jù)重構(gòu),利用降維方法得到同構(gòu)網(wǎng)絡(luò)或者二分圖,然后使用社區(qū)發(fā)現(xiàn)算法對(duì)同構(gòu)網(wǎng)絡(luò)或者二分圖進(jìn)行社區(qū)劃分,從而將異構(gòu)網(wǎng)絡(luò)社區(qū)劃分問題進(jìn)行有效轉(zhuǎn)化。2)將多維異構(gòu)網(wǎng)絡(luò)轉(zhuǎn)化為同構(gòu)網(wǎng)絡(luò)后,提出了一種并行種子擴(kuò)展算法PHSE,用來發(fā)現(xiàn)社交網(wǎng)絡(luò)中的重疊社區(qū)結(jié)構(gòu)。算法PHSE通過局部適應(yīng)度函數(shù)優(yōu)化和混合種子擴(kuò)展策略來得到自然社區(qū)。相較于算法LFM,算法PHSE不僅在合成網(wǎng)絡(luò)中有非常好的劃分結(jié)果,同時(shí)在真實(shí)世界社交網(wǎng)絡(luò)中也有非常好的劃分結(jié)果。尤其,當(dāng)合成網(wǎng)絡(luò)的節(jié)點(diǎn)重疊度高達(dá)On=50%時(shí),依舊可以準(zhǔn)確的劃分出重疊社區(qū)。3)提出了一種基于標(biāo)簽傳播的社區(qū)發(fā)現(xiàn)算法iSLPA,同時(shí)支持有向圖、無向圖和二分圖的社區(qū)劃分,算法在迭代過程中采用標(biāo)簽混合更新模式,并且在真實(shí)社交網(wǎng)絡(luò)數(shù)據(jù)集上表現(xiàn)出準(zhǔn)確的社區(qū)劃分結(jié)果。4)提出了一種基于并行計(jì)算框架Dpark的標(biāo)簽傳播算法HLPA,針對(duì)不同的網(wǎng)絡(luò)使用了不同的節(jié)點(diǎn)標(biāo)簽初始化策略,包括有向圖、無向圖和二分圖,同時(shí)還使用了混合標(biāo)簽更新策略使得算法更加穩(wěn)定,標(biāo)簽衰減策略使得算法可以避免劃分出“monster”社區(qū),也可以使得較小的社區(qū)得到充分成長。相較于之前的基于標(biāo)簽傳播的算法,算法HLPA在劃分benchmark基準(zhǔn)真實(shí)社交網(wǎng)絡(luò)時(shí)表現(xiàn)出了非常高的準(zhǔn)確性,而且在劃分大規(guī)模真實(shí)社交網(wǎng)絡(luò)時(shí)表現(xiàn)得非常有競爭力,大大提高算法效率,針對(duì)300萬節(jié)點(diǎn)、1.7億條邊的二分圖劃分社區(qū)只需要37.12分鐘,而且通過分析驗(yàn)證了劃分出的社區(qū)是有真實(shí)含義的社區(qū)結(jié)構(gòu)。5)根據(jù)1)中提出的異構(gòu)社交網(wǎng)絡(luò)分析框架,提出了一種基于主題感知的異構(gòu)網(wǎng)絡(luò)社區(qū)發(fā)現(xiàn)算法,該算法通過對(duì)數(shù)據(jù)重構(gòu)將多模網(wǎng)絡(luò)轉(zhuǎn)化為二模網(wǎng)絡(luò)(用戶-文檔),采用算法LDA-light從異構(gòu)網(wǎng)絡(luò)轉(zhuǎn)化得到的二模網(wǎng)絡(luò)映射為帶權(quán)重的二分圖網(wǎng)絡(luò)(用戶-主題),采用新提出的帶權(quán)重的二分圖社區(qū)發(fā)現(xiàn)算法WLPA對(duì)二分圖進(jìn)行社區(qū)劃分,最終將用戶和主題兩種不同的實(shí)體劃分在同一社區(qū)內(nèi),即劃分出的社區(qū)帶有語義信息,從而可以更好地進(jìn)行社區(qū)結(jié)構(gòu)分析。本文提出的社區(qū)發(fā)現(xiàn)算法具有一般性,可以推廣到許多同構(gòu)或者異構(gòu)社交網(wǎng)絡(luò)和數(shù)據(jù)集,并且可以應(yīng)用到更廣泛的實(shí)際問題中。
[Abstract]:In a social network community discovery as the data mining is a hot research field, the rapid development in recent years, the study focused on the relationship existing in the network analysis, get the results of community division. With the rise of Web2.0 and social networks flourish, the emergence of a variety of new online socializing. The social network structure is not enough to cope with solve real-world problems, so scholars further proposed heterogeneous social network (Heterogeneous Social Networks) concept. It is a complex network abstract structure, the network usually contains a variety of relations and entities, and the relationship between these different entities combined to form a network diverse and complex structure. How to deal with the complicated structure and obtain the community structure information, is a new challenge to the traditional social network community discovery. This article will focus on the classification of heterogeneous networks in the process of multi dimension, multi dimension and complex relationship, and dimension of network data reconstruction caused by multiple types of nodes and other characteristics of the problem; the traditional study focused only on map links, did not consider the semantic information, also is the theme of community with the help of the division; in addition to the traditional division algorithm requires knowledge and advance set number of community division results get in advance, but the number of community real world social networks are often unknown, especially in large-scale social networks prior knowledge of unknown issues. The research mainly includes the analysis of the general framework of heterogeneous social networks, discovery algorithm approximate linear label propagation algorithm based on community, theme the perception of the community based on, to design a heterogeneous network community discovery algorithm is efficient and fast. The research contents and innovations include: 1) proposed a different The social network analysis framework for data reconstruction in heterogeneous networks, using dimensionality reduction method to get two points or homogeneous network graph, and then use the community discovery algorithm of homogeneous network community division or two figure, which will be the effective transformation of.2 community into the heterogeneous network) will be transformed into multidimensional heterogeneous network homogeneous network, is proposed an extended PHSE algorithm parallel seeds, used to find the overlapping community structure in social networks. The PHSE algorithm through local fitness function optimization and hybrid seed expansion strategy to get the natural community. Compared to the LFM algorithm, PHSE algorithm not only has very good classification results in the synthesis of the network, but also has very good classification results in the real world social networks. Especially, when the node network synthesis overlap is as high as On=50%, still can be accurately divided into overlapping community.3) put forward a ISLPA found the label propagation algorithm based on community, while supporting the directed graph, undirected graph and graph two community division algorithm using hybrid label update mode in the iterative process, and show the community classification results accurate.4 in the data set on the real social network) proposed a parallel HLPA label propagation algorithm framework based on Dpark, according to the different network using node label different initialization strategies, including directed graph, undirected graph and two points, while also using the hybrid tag update strategy which makes the algorithm more stable label attenuation strategy makes the algorithm can avoid the division of the "monster" community, also can make small the community has been fully grow. Compared to the previous label propagation algorithm based on HLPA algorithm in the division of benchmark benchmark real social network showed very high accuracy, but also in row Divided into large-scale real social networks is very competitive, greatly improve the efficiency of the algorithm for the 3 million node, two graph partitioning community 170 million edges need 37.12 minutes only, and verified through the analysis of the community is a community structure of.5) 1) according to the true meaning of the heterogeneous social network analysis framework proposed that presents a heterogeneous network community discovery algorithm based on perception of the subject, based on data reconstruction multimode networks into the network (the second mock exam user - document), use LDA-light algorithm to get the second mock exam network mapping transformation from heterogeneous network two network graph with weight (user topic), using the new with the weight of the two figure WLPA community discovery algorithm of community division two figure, will end users and theme two different entity in the same community, which is divided into the community with Semantic information can better analyze community structure. The community detection algorithm proposed in this paper is general, and can be extended to many isomorphic or heterogeneous social networks and data sets, and it can be applied to a wider range of practical problems.
【學(xué)位授予單位】:中國礦業(yè)大學(xué)(北京)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2016
【分類號(hào)】:O157.5;TP393.09
【參考文獻(xiàn)】
相關(guān)期刊論文 前5條
1 辛宇;楊靜;謝志強(qiáng);;基于標(biāo)簽傳播的語義重疊社區(qū)發(fā)現(xiàn)算法[J];自動(dòng)化學(xué)報(bào);2014年10期
2 王莉;程學(xué)旗;;在線社會(huì)網(wǎng)絡(luò)的動(dòng)態(tài)社區(qū)發(fā)現(xiàn)及演化[J];計(jì)算機(jī)學(xué)報(bào);2015年02期
3 張俊麗;常艷麗;師文;;標(biāo)簽傳播算法理論及其應(yīng)用研究綜述[J];計(jì)算機(jī)應(yīng)用研究;2013年01期
4 劉欣;Tsuyoshi Murata;;Detecting Communities in K-Partite K-Uniform (Hyper)Networks[J];Journal of Computer Science & Technology;2011年05期
5 楊博;劉大有;金弟;馬海賓;;復(fù)雜網(wǎng)絡(luò)聚類方法[J];軟件學(xué)報(bào);2009年01期
,本文編號(hào):1652785
本文鏈接:http://sikaile.net/shoufeilunwen/jckxbs/1652785.html
最近更新
教材專著