在線社交網(wǎng)絡(luò)的自適應(yīng)UNI64采樣方法研究
發(fā)布時(shí)間:2018-06-03 17:12
本文選題:在線社交網(wǎng)絡(luò) + 采樣方法 ; 參考:《北京化工大學(xué)》2016年碩士論文
【摘要】:在線社交網(wǎng)絡(luò)(Online Social Network, OSN)的興起給網(wǎng)絡(luò)帶來了新的革命,同時(shí)它自身的很多特性也對(duì)現(xiàn)實(shí)社會(huì)產(chǎn)生了廣泛而深入的影響。近些年來已吸引了很多研究學(xué)者對(duì)在線社交網(wǎng)絡(luò)進(jìn)行分析和研究。由于在線社交網(wǎng)絡(luò)屬于大規(guī)模網(wǎng)絡(luò),其自身特性和行為模式較為復(fù)雜,無法準(zhǔn)確的獲得真實(shí)網(wǎng)絡(luò)的全部數(shù)據(jù),所以大部分研究都是基于真實(shí)網(wǎng)絡(luò)的樣本網(wǎng)絡(luò)進(jìn)行的。對(duì)于在線社交網(wǎng)絡(luò)的研究,樣本網(wǎng)絡(luò)質(zhì)量對(duì)研究結(jié)果是極為重要的。因此,通過研究網(wǎng)絡(luò)的采樣方法獲得一個(gè)能夠反映真實(shí)網(wǎng)絡(luò)某一方面或某些方面特征的網(wǎng)絡(luò)樣本是在線社交網(wǎng)絡(luò)研究的前提保障。通過大量的研究,學(xué)者們已經(jīng)提出了多種對(duì)于網(wǎng)絡(luò)的采樣方法,但是需要一個(gè)無偏均勻的樣本集來對(duì)這些采樣方法和結(jié)果的優(yōu)劣進(jìn)行評(píng)價(jià)。而UNI方法采樣獲得的樣本網(wǎng)絡(luò)恰好符合要求,它以拒絕-接受采樣為依據(jù)進(jìn)行無偏均勻的采樣。但該方法也有局限性,僅適用于采集用戶ID系統(tǒng)為32位整數(shù)的網(wǎng)絡(luò),現(xiàn)在大多數(shù)在線社交網(wǎng)絡(luò)的用戶ID系統(tǒng)都已經(jīng)升級(jí)為64位整數(shù)系統(tǒng),這就使得表現(xiàn)良好的UNI方法對(duì)64位整數(shù)系統(tǒng)的采樣命中率幾乎為零,導(dǎo)致該方法無法繼續(xù)使用。本文采用統(tǒng)計(jì)學(xué)方法對(duì)在線社交網(wǎng)絡(luò)用戶64位ID系統(tǒng)的分布情況進(jìn)行了詳細(xì)分析,其結(jié)果表明,在線社交網(wǎng)絡(luò)用戶ID的分布呈非均勻非隨機(jī)分布。根據(jù)此分析結(jié)果并結(jié)合自適應(yīng)的思想對(duì)UNI方法進(jìn)行了改進(jìn),設(shè)計(jì)實(shí)現(xiàn)一種適用于64位整數(shù)用戶ID系統(tǒng)的高效無偏均勻的自適應(yīng)采樣方法,稱為“自適應(yīng)UNI64方法”。最后在新浪微博數(shù)據(jù)集上對(duì)該方法的采樣效果進(jìn)行了實(shí)驗(yàn)驗(yàn)證,實(shí)驗(yàn)結(jié)果表明,自適應(yīng)UNI64方法能在64位整數(shù)ID系統(tǒng)空間進(jìn)行采樣,且采樣命中率和采樣效率較UNI方法有很大提高,得到的樣本網(wǎng)絡(luò)有效ID的分布符合實(shí)際。
[Abstract]:The rise of online Social Network, OSN) has brought a new revolution to the network, and many of its own characteristics have had a wide and deep impact on the real society. In recent years, many researchers have been attracted to analyze and study online social networks. Because the online social network belongs to the large-scale network, its own characteristic and the behavior pattern is more complex, cannot accurately obtain the real network all data, so most of the research is based on the real network sample network. For the research of online social network, the quality of sample network is very important to the research results. Therefore, it is the premise of online social network research to obtain a network sample which can reflect the characteristics of some aspect or some aspect of the real network by studying the sampling method of the network. Through a lot of research, scholars have proposed a variety of sampling methods for the network, but an unbiased uniform sample set is needed to evaluate the merits and demerits of these sampling methods and results. The sample network obtained by UNI method meets the requirements, and it is unbiased and uniform sampling based on rejection-accept sampling. However, this method has its limitations. It is only suitable for the network where the user ID system is a 32-bit integer. Nowadays, most online social network user ID systems have been upgraded to 64-bit integer systems. This makes the good performance of the UNI method to 64-bit integer system sampling hit rate is almost zero, resulting in the method can not continue to use. In this paper, the distribution of 64 bit ID system for online social network users is analyzed in detail by statistical method. The results show that the distribution of online social network user ID is non-uniform and non-random. According to the analysis results and the adaptive idea, the UNI method is improved, and an efficient and unbiased adaptive sampling method for 64-bit integer user ID system is designed and implemented, which is called "adaptive UNI64 method". Finally, the sampling effect of this method is verified on Sina Weibo dataset. The experimental results show that the adaptive UNI64 method can be sampled in 64-bit integer ID system space. The sample hit rate and sampling efficiency are much higher than that of the UNI method, and the distribution of the effective ID of the sample network is in line with the actual situation.
【學(xué)位授予單位】:北京化工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2016
【分類號(hào)】:TP393.09
【參考文獻(xiàn)】
相關(guān)期刊論文 前4條
1 崔穎安;李雪;王志曉;張德運(yùn);;在線社交媒體數(shù)據(jù)抽樣方法的比較研究[J];計(jì)算機(jī)學(xué)報(bào);2014年08期
2 劉暉;王星;;社交網(wǎng)絡(luò)技術(shù)在國(guó)外社會(huì)運(yùn)動(dòng)中的作用案例分析[J];中國(guó)信息安全;2014年07期
3 方錦清;;網(wǎng)絡(luò)復(fù)雜性金字塔揭秘[J];中國(guó)原子能科學(xué)研究院年報(bào);2009年00期
4 石曉明;施倫;張解放;;Opinion evolution based on cellular automata rules in small world networks[J];Chinese Physics B;2010年03期
,本文編號(hào):1973557
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/1973557.html
最近更新
教材專著