天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于標(biāo)簽傳播算法的社區(qū)發(fā)現(xiàn)新算法

發(fā)布時(shí)間:2018-08-12 19:39
【摘要】:互聯(lián)網(wǎng)近年來(lái)的飛速發(fā)展,造就了一批以社交為主的網(wǎng)站,其中國(guó)外的Facebook,Twitter,Google+,國(guó)內(nèi)的QQ空間,豆瓣,人人等最為流行。這些社交網(wǎng)站每天都會(huì)有大量的用戶使用,并且產(chǎn)生大量的分享數(shù)據(jù),建立新的朋友關(guān)系等。對(duì)于這些數(shù)據(jù)來(lái)說(shuō),具有很高的利用價(jià)值,比如網(wǎng)絡(luò)營(yíng)銷(xiāo),輿情分析等。因此對(duì)這些數(shù)據(jù)的處理方式尤為重要,其中社區(qū)發(fā)現(xiàn)是研究熱點(diǎn)。本文主要是針對(duì)目前常用的社區(qū)發(fā)現(xiàn)算法進(jìn)行改進(jìn)。最常用的標(biāo)簽傳播算法LPA的思想是在初始情況下,網(wǎng)絡(luò)中的每個(gè)節(jié)點(diǎn)都被初始化為唯一的標(biāo)簽,在迭代更新每個(gè)節(jié)點(diǎn)的標(biāo)簽時(shí),節(jié)點(diǎn)是根據(jù)其鄰居節(jié)點(diǎn)中標(biāo)簽個(gè)數(shù)最多的作為更新標(biāo)簽,如果標(biāo)簽個(gè)數(shù)最多的標(biāo)簽并不是唯一的,那么隨機(jī)從中選擇一個(gè)標(biāo)簽來(lái)更新當(dāng)前節(jié)點(diǎn),最終達(dá)到收斂或者震蕩,該算法停止。由于該算法的思想和實(shí)現(xiàn)過(guò)程導(dǎo)致該算法有一些缺點(diǎn),例如不穩(wěn)定性,發(fā)現(xiàn)的社區(qū)要么是巨型社區(qū),要么就是無(wú)意義的小型社區(qū),分布極其不均勻并且對(duì)網(wǎng)絡(luò)的結(jié)構(gòu)比較敏感,二分網(wǎng)絡(luò)情況下會(huì)發(fā)生循環(huán)震蕩。針對(duì)上述LPA算法的一系列缺點(diǎn),本文提出了LAAPA(label-attributeattenuation progagation algorithm),基于標(biāo)簽屬性和衰減因素的社區(qū)發(fā)現(xiàn)算法。該算法引人了傳播衰減因子和節(jié)點(diǎn)屬性,其中傳播衰減因子顧名思義就是節(jié)點(diǎn)標(biāo)簽的傳播距離是有限的,并且隨著距離的增加節(jié)點(diǎn)標(biāo)簽的影響力逐漸降低,更新標(biāo)簽的權(quán)重也隨之減少;節(jié)點(diǎn)屬性是指在社交網(wǎng)絡(luò)中節(jié)點(diǎn)之間的關(guān)系并不僅僅是關(guān)注與被關(guān)注這種簡(jiǎn)單的傳統(tǒng)上的“邊”,為了與實(shí)際情況更加吻合本文提出了節(jié)點(diǎn)屬性,具體是指將節(jié)點(diǎn)的其他屬性比如豆瓣網(wǎng)中用戶加入的“小組”作為節(jié)點(diǎn)的屬性,節(jié)點(diǎn)之間的相同屬性將會(huì)反映到節(jié)點(diǎn)之間的邊上,使用權(quán)值來(lái)表示。在算法迭代中,節(jié)點(diǎn)更新標(biāo)簽時(shí),將考慮鄰居節(jié)點(diǎn)的標(biāo)簽傳播距離,節(jié)點(diǎn)之間邊的權(quán)值,節(jié)點(diǎn)的度等因素。本文通過(guò)兩組標(biāo)準(zhǔn)數(shù)據(jù)對(duì)提出的LAAPA算法和LPA算法進(jìn)行比較,在社區(qū)大小,模塊度等方面LAAPA算法發(fā)現(xiàn)的社區(qū)比LPA算法效果好。在使用Scrapy抓取的豆瓣網(wǎng)數(shù)據(jù)中,經(jīng)過(guò)清洗格式化后驗(yàn)證了該數(shù)據(jù)符合社交網(wǎng)絡(luò)的特性,“小世界”,節(jié)點(diǎn)中心性等。利用兩種社區(qū)發(fā)現(xiàn)算法進(jìn)行社區(qū)發(fā)現(xiàn),進(jìn)行對(duì)比,結(jié)果也是LAAPA算法發(fā)現(xiàn)的社區(qū)質(zhì)量比LPA算法質(zhì)量高,并且穩(wěn)定,社區(qū)分布均勻,沒(méi)有巨型社區(qū)的出現(xiàn)。
[Abstract]:The rapid development of the Internet in recent years has created a number of social-oriented websites, including Facebook Twitter Google, domestic QZone, Douban, everyone and so on the most popular. These social networking sites are used by a large number of users every day and generate a lot of data sharing, new friendships and so on. For these data, it has a high value, such as network marketing, public opinion analysis and so on. Therefore, the processing of these data is particularly important, among which community discovery is a hot topic. This paper is mainly aimed at the improvement of community discovery algorithm commonly used at present. The idea of LPA, the most commonly used tag propagation algorithm, is that, initially, each node in the network is initialized as a unique tag, and each node's label is updated iteratively. The node updates the label according to the number of tags in the neighbor node. If the label with the largest number of tags is not unique, then randomly select a label from which to update the current node, and finally achieve convergence or oscillation. The algorithm stops. As a result of the thought and implementation process of the algorithm, the algorithm has some disadvantages, such as instability, the communities found are either giant communities or meaningless small communities, which are extremely unevenly distributed and sensitive to the structure of the network. In the case of binary networks, cyclic oscillations occur. In view of a series of shortcomings of the above LPA algorithm, this paper proposes a community discovery algorithm based on label attributes and attenuation factors for label-attributeattenuation progagation algorithm), (label-attributeattenuation progagation algorithm),). The algorithm induces propagation attenuation factor and node attribute, in which the propagation attenuation factor is, as the name implies, that the propagation distance of the node label is limited, and the influence of the node label decreases gradually with the increase of the distance. The weight of updating tags is also reduced; the node attribute refers to the relationship between nodes in the social network is not only concerned with and be concerned about this simple traditional "edge", in order to more consistent with the actual situation, this paper proposed node attributes. It refers to the other attributes of nodes, such as the "group" added by users in the Douban network, and the same attributes between nodes will be reflected to the edge of the nodes, and the value of the right to use will be used to express them. In the iteration of the algorithm, the label propagation distance of the neighbor node, the weight of the edge between nodes, the degree of the node and so on will be taken into account when the node updates the label. This paper compares the proposed LAAPA algorithm with LPA algorithm through two sets of standard data. The community found by LAAPA algorithm is better than that of LPA algorithm in community size, module degree and so on. In the data collected by Scrapy, it is verified that the data conforms to the characteristics of social network, "small world", node centrality and so on. Compared with the two community discovery algorithms, the results show that the community quality of LAAPA algorithm is higher than that of LPA algorithm, and the quality of community is stable, the community distribution is even, and there is no giant community.
【學(xué)位授予單位】:西安電子科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2015
【分類號(hào)】:TP301.6

【參考文獻(xiàn)】

相關(guān)期刊論文 前6條

1 梁宗文;楊帆;李建平;;基于節(jié)點(diǎn)相似性度量的社團(tuán)結(jié)構(gòu)劃分方法[J];計(jì)算機(jī)應(yīng)用;2015年05期

2 陽(yáng)廣元;曹霞;甯佐斌;潘煦;;國(guó)內(nèi)社區(qū)發(fā)現(xiàn)研究進(jìn)展[J];情報(bào)資料工作;2014年02期

3 武志昊;林友芳;Steve Gregory;萬(wàn)懷宇School of Computer and Information Technology,Beijing Jiaotong University;田盛豐;;Balanced Multi-Label Propagation for Overlapping Community Detection in Social Networks[J];Journal of Computer Science & Technology;2012年03期

4 韓瑞凱;孟嗣儀;劉云;郭英慧;張彥超;;基于興趣相似度的社區(qū)結(jié)構(gòu)發(fā)現(xiàn)算法研究[J];鐵路計(jì)算機(jī)應(yīng)用;2010年10期

5 淦文燕;赫南;李德毅;王建民;;一種基于拓?fù)鋭?shì)的網(wǎng)絡(luò)社區(qū)發(fā)現(xiàn)方法[J];軟件學(xué)報(bào);2009年08期

6 沈華偉;程學(xué)旗;陳海強(qiáng);劉悅;;基于信息瓶頸的社區(qū)發(fā)現(xiàn)[J];計(jì)算機(jī)學(xué)報(bào);2008年04期

,

本文編號(hào):2180147

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/guanlilunwen/yingxiaoguanlilunwen/2180147.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶26f37***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com