大型在線社交網(wǎng)絡(luò)(OSN)用戶采樣、測量、評價關(guān)鍵問題研究
發(fā)布時間:2018-05-20 23:22
本文選題:社交網(wǎng)絡(luò) + 核心網(wǎng)絡(luò)。 參考:《北京郵電大學(xué)》2014年碩士論文
【摘要】:社交網(wǎng)絡(luò)數(shù)據(jù)有兩大特點(diǎn):一是數(shù)據(jù)量巨大,國內(nèi)外流行的社交平臺中的用戶數(shù)量都在一億以上,這些用戶之間的邊就更多了,對整個網(wǎng)絡(luò)的總體進(jìn)行分析是不現(xiàn)實的;二是網(wǎng)絡(luò)結(jié)構(gòu)復(fù)雜,整個網(wǎng)絡(luò)的關(guān)系都是用戶自行組織起來的,其內(nèi)部蘊(yùn)含了多層次的實體關(guān)系,目前基于采樣的研究方法很難還原和處理如此復(fù)雜的內(nèi)相干性?梢哉f,社交網(wǎng)絡(luò)內(nèi)部多層次的實體關(guān)系是影響社交網(wǎng)絡(luò)用戶采樣、測量和評價的關(guān)鍵問題。本文希望對社交網(wǎng)絡(luò)內(nèi)部多層次的實體關(guān)系進(jìn)行探索性研究來更好的進(jìn)行社交網(wǎng)絡(luò)的用戶采樣、測量和評價。 本文主要聚焦在大型在線社交網(wǎng)絡(luò)中不對稱關(guān)系(多層次的實體關(guān)系)的研究上,具體方法為通過采樣的方法獲取社交網(wǎng)絡(luò)中的層次化結(jié)構(gòu),然后測量層次化網(wǎng)絡(luò)中的屬性特征,最后針對測量結(jié)果給出評價。社交網(wǎng)絡(luò)的不對稱性主要體現(xiàn)在節(jié)點(diǎn)的不對稱性上即節(jié)點(diǎn)的多層次性,分為用戶影響力的不平衡性和邊的不對稱性。本文將在社交網(wǎng)絡(luò)中占據(jù)優(yōu)勢的節(jié)點(diǎn)稱為核心節(jié)點(diǎn),即上面所說的“明星用戶”,處于劣勢的節(jié)點(diǎn)稱為外圍節(jié)點(diǎn)。本文從節(jié)點(diǎn)的層次性角度出發(fā),將社交網(wǎng)絡(luò)分為三部分:核心網(wǎng)絡(luò)、外圍網(wǎng)絡(luò)和核心-外圍結(jié)構(gòu),其中核心網(wǎng)絡(luò)是本文研究的重點(diǎn)。 本文在第三章中討論了社交網(wǎng)絡(luò)的節(jié)點(diǎn)層次性之后選取目前國內(nèi)規(guī)模最大,影響力最廣的新浪微博作為研究對象,對爬取的數(shù)據(jù)進(jìn)行清理之后,構(gòu)建了一個擁有3500萬新浪微博用戶的網(wǎng)絡(luò)。首先,經(jīng)過統(tǒng)計分析,本文給出了這個網(wǎng)絡(luò)的度分布特征和入度出度比特征,結(jié)果發(fā)現(xiàn)新浪微博的度分布符合典型的冪率分布;緊接著本文從3500萬用戶的網(wǎng)絡(luò)中找出了核心用戶(定義粉絲數(shù)大于5000的用戶屬于核心用戶)組成的核心網(wǎng)絡(luò),從度分布、入度出度比、聚類系數(shù)、網(wǎng)絡(luò)密度、邊對稱性這幾個屬性的角度來分析新浪微博中核心網(wǎng)絡(luò)的性質(zhì);然后為了驗證不同采樣方法檢測核心網(wǎng)絡(luò)的有效性,本文重點(diǎn)對比分析了滾雪球采樣和隨機(jī)游走,發(fā)現(xiàn)兩種方法獲取的核心網(wǎng)絡(luò)的網(wǎng)絡(luò)密度相差不大且都要比真實網(wǎng)絡(luò)更加稀疏,但是當(dāng)采樣比較低時,滾雪球采樣方法獲取的核心網(wǎng)絡(luò)在網(wǎng)絡(luò)密度、邊對稱性、聚類系數(shù)三種拓?fù)鋵傩栽u價上都要比隨機(jī)游走更好,更加接近真實網(wǎng)絡(luò),因此本文認(rèn)為滾雪球采樣在核心網(wǎng)絡(luò)的研究中更具有優(yōu)勢;最后在前面工作的基礎(chǔ)上,設(shè)計三個實驗深入分析滾雪球采樣隨著采樣種子數(shù)量,采樣深度,采樣比的變化在檢測核心網(wǎng)絡(luò)方面的有效性,發(fā)現(xiàn)種子的隨機(jī)性影響收斂的速度和采樣的偏差,采樣比控制著擴(kuò)展速度,采樣深度實際上是由前兩個因素決定的,但是在可控制的情況下,采樣深度可以根據(jù)要爬取的網(wǎng)絡(luò)進(jìn)行調(diào)整。據(jù)作者所知,本文最先聚焦在檢測社交網(wǎng)絡(luò)的核心并將社交網(wǎng)絡(luò)的核心作為網(wǎng)絡(luò)的特征,同時分析了核心節(jié)點(diǎn)的覆蓋度以及度分布,核心用戶粉絲網(wǎng)絡(luò)的密度,它們體現(xiàn)了大量的核心網(wǎng)絡(luò)特征
[Abstract]:There are two characteristics of social network data: one is the huge amount of data, the number of users in the popular social platform at home and abroad is more than one hundred million, the edges of these users are more, the overall analysis of the whole network is unrealistic; the two is that the network structure is complex and the whole network relationship is organized by the user itself. It is difficult to restore and deal with such complex internal coherence. It can be said that the multi-level entity relationship within the social network is the key problem that affects the sampling, measurement and evaluation of social network users. This article hopes to make a multilevel entity relationship within the social network. Conduct exploratory research to better sample, measure and evaluate social network users.
This paper focuses on the research of asymmetric relationships (multi-level entity relationships) in large online social networks. The specific method is to obtain hierarchical structure in social networks by sampling methods, and then measure attribute characteristics in hierarchical networks. Finally, the evaluation of measurement results is given. The main body of social network asymmetry is the main body. At present, the asymmetry of nodes is the multilevel of nodes, which are divided into the imbalance of the user influence and the asymmetry of the edges. In this paper, the nodes which occupy the advantage in the social network are called the core nodes, that is, the "star users" mentioned above, and the disadvantaged nodes are called the outer circumference nodes. This paper starts with the hierarchical point of view of the nodes, The social network is divided into three parts: the core network, the peripheral network and the core periphery structure. The core network is the focus of this paper.
In the third chapter, in the third chapter, the nodes of the social network are discussed, and the largest and most influential micro-blog in China is selected as the research object. After cleaning up the crawling data, a network of 35 million Sina micro-blog users is constructed. First, the degree of the network is given by statistical analysis. It is found that the degree distribution of sina micro-blog conforms to the typical power distribution, and the core network of the core users (which defines the number of fans more than 5000 is the core user) is found from the network of 35 million users, from the degree distribution, the ratio of admission, the clustering coefficient and the density of the network. The nature of the core network in Sina micro-blog is analyzed by the angle of edge symmetry. Then, in order to verify the validity of the core network in different sampling methods, this paper focuses on the comparison and analysis of snowball sampling and random walk. It is found that the network density of the core networks obtained by the two methods is not very different and is more than the real network. It is more sparse, but when the sampling is low, the core network obtained by the snowball sampling method is better than random walk in the three topological properties evaluation of network density, edge symmetry and clustering coefficient, and it is closer to the real network. Therefore, this paper thinks that snowball sampling is more advantageous in the Research of nuclear core network; finally, it is in the front. On the basis of the work, three experiments are designed to analyze the effectiveness of the snowball sampling, with the number of samples, the depth of sampling and the change of sampling ratio in the core network. It is found that the randomness of the seeds affects the speed of convergence and the deviation of the sampling. The sampling ratio controls the expansion speed, and the sampling depth is actually the first two factors. It is determined, but under control, the sampling depth can be adjusted according to the network to be crawled. According to the author's knowledge, this article is the first to focus on the core of the social network and the core of the social network as a feature of the network, at the same time analyses the coverage and degree distribution of the core nodes, the density of the core user's fan network. They embody a large number of core network features
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TP393.0
【參考文獻(xiàn)】
相關(guān)期刊論文 前1條
1 熊文海;趙繼軍;S.Boccaletti;V.Latora;Y.Moreno;M.Chavezf;D.-U.Hwang;;復(fù)雜網(wǎng)絡(luò):結(jié)構(gòu)與動力學(xué)(英文)[J];復(fù)雜系統(tǒng)與復(fù)雜性科學(xué);2006年04期
,本文編號:1916697
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/1916697.html
最近更新
教材專著