基于結(jié)構(gòu)相似度的大規(guī)模社交網(wǎng)絡(luò)聚類算法研究
本文選題:有向圖 + 并行算法; 參考:《南開大學(xué)》2013年碩士論文
【摘要】:社交網(wǎng)絡(luò)為社交系統(tǒng)下個(gè)體之間的關(guān)系所組成的復(fù)雜網(wǎng)絡(luò)結(jié)構(gòu)。隨著信息時(shí)代互聯(lián)網(wǎng)科技的迅猛發(fā)展,社交網(wǎng)絡(luò),特別是在線社交網(wǎng)絡(luò),已成為人與人之間分享信息不可或缺的媒介。社交網(wǎng)絡(luò)中個(gè)體之間的相互連接關(guān)系作為信息的傳播途徑,在很多方面有不可忽視的作用。如廣告投遞,潛在商機(jī)發(fā)現(xiàn),效果預(yù)測以及危機(jī)預(yù)警。因此如何從這些龐大的網(wǎng)絡(luò)中獲取有價(jià)值的信息成為了目前重要的研究課題。網(wǎng)絡(luò)結(jié)構(gòu)分析也吸引了眾多研究者的關(guān)注,其中的網(wǎng)絡(luò)聚類即是一種有效的結(jié)構(gòu)分析手段和途徑。 然而目前的網(wǎng)絡(luò)聚類算法仍面臨重大的挑戰(zhàn)。首先,現(xiàn)有網(wǎng)絡(luò)聚類算法沒有充分考慮實(shí)際社交網(wǎng)絡(luò)的特性。對社交網(wǎng)絡(luò)的結(jié)構(gòu)分析不同于一般網(wǎng)絡(luò)聚類,社交網(wǎng)絡(luò)中常常存在一些具有特殊作用的點(diǎn),同時(shí)節(jié)點(diǎn)間的社交關(guān)系大多為有向的。其次,沒有將大規(guī)模網(wǎng)絡(luò)數(shù)據(jù)的處理作為目標(biāo)。本文針對現(xiàn)有社交網(wǎng)絡(luò)聚類中所面臨的問題,提出了一種面向大規(guī)模有向網(wǎng)絡(luò)的結(jié)構(gòu)聚類算法。 首先,提出了基于結(jié)構(gòu)相似度的處理有向網(wǎng)絡(luò)的聚類方法。本文對有向網(wǎng)絡(luò)進(jìn)行聚類操作提出了兩種不同的方法:1.提出一種兩階段方法,首先將有向網(wǎng)絡(luò)近似為無向網(wǎng)絡(luò),再使用結(jié)構(gòu)相似度聚類算法進(jìn)行結(jié)構(gòu)分析;2.對現(xiàn)有的針對無向網(wǎng)絡(luò)的方法進(jìn)行改進(jìn)使其能夠直接對有向網(wǎng)絡(luò)進(jìn)行聚類。 其次,針對社交網(wǎng)絡(luò)的大規(guī)模特性,本文研究了如何將原本非并行的基于結(jié)構(gòu)相似度的聚類算法進(jìn)行并行化,使其能夠處理大規(guī)模網(wǎng)絡(luò)數(shù)據(jù)。算法中,針對社交網(wǎng)絡(luò)數(shù)據(jù)特性,設(shè)計(jì)了合理的數(shù)據(jù)劃分策略,各機(jī)器之間的數(shù)據(jù)交換策略。本文對算法進(jìn)行了理論分析,證明采用這種高效率的并行編程框架實(shí)現(xiàn)的并行網(wǎng)絡(luò)結(jié)構(gòu)聚類算法的結(jié)果與原非并行算法的結(jié)果是一致的。 最后,本文基于MapReduce并行架構(gòu)實(shí)現(xiàn)了所提出的并行式網(wǎng)絡(luò)聚類算法。大量實(shí)驗(yàn)結(jié)果表明本文提出的算法能夠提高有向網(wǎng)絡(luò)聚類算法的準(zhǔn)確度,同時(shí)并行方法能夠有效處理大規(guī)模的網(wǎng)絡(luò)聚類問題。 綜上所述,本文在有向社交網(wǎng)絡(luò)并行聚類問題上取得了一定的進(jìn)展和效果,在社交網(wǎng)絡(luò)的結(jié)構(gòu)信息發(fā)現(xiàn)相關(guān)領(lǐng)域有很好的應(yīng)用前景。
[Abstract]:A complex network structure consisting of relationships between individuals in a social system. With the rapid development of Internet technology in the information age, social networks, especially online social networks, have become an indispensable medium for sharing information among people. As a way to spread information, the interconnectedness of individuals in social networks plays an important role in many aspects. Such as advertising delivery, potential business opportunities discovery, effect prediction and crisis warning. Therefore, how to obtain valuable information from these huge networks has become an important research topic. Network structure analysis has also attracted the attention of many researchers, among which network clustering is an effective means and approach to structure analysis. However, the current network clustering algorithm still faces great challenges. Firstly, the existing clustering algorithms do not fully consider the characteristics of real social networks. The structure analysis of social network is different from that of general network clustering. There are some special points in social network, and the social relations between nodes are mostly directed. Second, the processing of large-scale network data has not been targeted. In this paper, a structural clustering algorithm for large scale directed networks is proposed to solve the problems existing in the existing social network clustering. Firstly, a clustering method based on structural similarity is proposed to deal with directed networks. In this paper, we propose two different methods of clustering for directed networks: 1. In this paper, a two-stage approach is proposed. Firstly, the directed network is approximated as an undirected network, and then the structural similarity clustering algorithm is used to analyze the structure. The existing methods for undirected networks are improved to cluster the directed networks directly. Secondly, in view of the large-scale characteristics of social networks, this paper studies how to parallelize the original non-parallel clustering algorithm based on structural similarity to enable it to deal with large-scale network data. In the algorithm, a reasonable data partition strategy and a data exchange strategy between different machines are designed according to the data characteristics of social network. In this paper, the theoretical analysis of the algorithm is carried out, and it is proved that the result of the parallel network clustering algorithm implemented by this efficient parallel programming framework is consistent with that of the original non-parallel algorithm. Finally, the proposed parallel network clustering algorithm based on MapReduce parallel architecture is implemented. A large number of experimental results show that the proposed algorithm can improve the accuracy of the directed network clustering algorithm, and the parallel method can effectively deal with large-scale network clustering problems. To sum up, this paper has made some progress and effect on the parallel clustering of directed social networks, and has a good application prospect in the related fields of structural information discovery of social networks.
【學(xué)位授予單位】:南開大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:TP311.13;O157.5
【共引文獻(xiàn)】
相關(guān)期刊論文 前10條
1 郭進(jìn)時(shí);湯紅波;王曉雷;;基于社會(huì)網(wǎng)絡(luò)增量的動(dòng)態(tài)社區(qū)組織探測[J];電子與信息學(xué)報(bào);2013年09期
2 郎波;張博宇;;面向大數(shù)據(jù)的非結(jié)構(gòu)化數(shù)據(jù)管理平臺(tái)關(guān)鍵技術(shù)[J];信息技術(shù)與標(biāo)準(zhǔn)化;2013年10期
3 邵景峰;崔尊民;王進(jìn)富;白曉波;;大數(shù)據(jù)下紡織制造執(zhí)行系統(tǒng)的構(gòu)建[J];紡織器材;2013年06期
4 張亞楠;譚躍生;;基于MapReduce的并行遮蓋文本聚類算法[J];內(nèi)蒙古科技大學(xué)學(xué)報(bào);2013年03期
5 張毅;曹晶晶;齊莉娜;吳必虎;;旅游目的地虛擬網(wǎng)絡(luò)結(jié)構(gòu)特征研究——以黃山市為例[J];北京大學(xué)學(xué)報(bào)(自然科學(xué)版);2013年06期
6 周國亮;朱永利;王桂蘭;;CC-MRSJ:Hadoop平臺(tái)下緩存敏感的星型聯(lián)接算法[J];電信科學(xué);2013年10期
7 章祥蓀;張忠元;;非負(fù)矩陣分解:模型、算法和應(yīng)用[J];重慶師范大學(xué)學(xué)報(bào)(自然科學(xué)版);2013年06期
8 周濤;張子柯;陳關(guān)榮;汪小帆;史定華;狄增如;樊瑛;方錦清;韓筱璞;劉建國;劉潤然;劉宗華;陸君安;呂金虎;呂琳媛;榮智海;汪秉宏;許小可;章忠志;;復(fù)雜網(wǎng)絡(luò)研究的機(jī)遇與挑戰(zhàn)[J];電子科技大學(xué)學(xué)報(bào);2014年01期
9 王偉;楊慧;龔凱;唐明;都永海;;復(fù)雜網(wǎng)絡(luò)上的局域免疫研究[J];電子科技大學(xué)學(xué)報(bào);2013年06期
10 劉瑩;劉國奇;任介夫;姜琳穎;張斌;;基于Web服務(wù)復(fù)雜網(wǎng)絡(luò)的服務(wù)社區(qū)構(gòu)建方法[J];東南大學(xué)學(xué)報(bào)(自然科學(xué)版);2013年06期
,本文編號:1840736
本文鏈接:http://sikaile.net/wenyilunwen/guanggaoshejilunwen/1840736.html