微博用戶行為分析和網(wǎng)絡(luò)結(jié)構(gòu)演化的研究
本文選題:微博網(wǎng)絡(luò) + 用戶行為; 參考:《北京交通大學(xué)》2014年博士論文
【摘要】:隨著互聯(lián)網(wǎng)絡(luò)、尤其移動(dòng)互聯(lián)網(wǎng)絡(luò)的飛速發(fā)展,微博已經(jīng)成為一種非常重要的在線社會(huì)網(wǎng)絡(luò)形式。在微博網(wǎng)絡(luò)中,用戶接入方式更加方便多樣,交互方式更加靈活快捷,信息傳播更加迅速廣泛,其中用戶行為和網(wǎng)絡(luò)結(jié)構(gòu)是影響信息傳播過程的兩個(gè)關(guān)鍵因素。鑒于此,本文采用交叉學(xué)科的思想和方法,針對(duì)微博中用戶行為特征和模型、用戶特征量分布形成機(jī)制和增長規(guī)律、網(wǎng)絡(luò)中心性和信息傳播度量、網(wǎng)絡(luò)拓?fù)浣Y(jié)構(gòu)特征和演化模型等問題進(jìn)行了研究,嘗試發(fā)現(xiàn)微博用戶行為模式和網(wǎng)絡(luò)結(jié)構(gòu)演化規(guī)律,建立能夠刻畫這些規(guī)律的數(shù)學(xué)模型,并尋找可以預(yù)測用戶行為的相關(guān)策略。論文的工作有助于認(rèn)識(shí)微博用戶行為特征,加深對(duì)微博網(wǎng)絡(luò)結(jié)構(gòu)和信息傳播關(guān)系的認(rèn)識(shí),也為復(fù)雜網(wǎng)絡(luò)和社會(huì)網(wǎng)絡(luò)的理論研究提供一些探索性的結(jié)果。 論文的研究工作得到了國家自然科學(xué)基金項(xiàng)目(No.61172072、61271308)、北京市自然科學(xué)基金項(xiàng)目(No.4112045)和中央高校基本科研業(yè)務(wù)費(fèi)專項(xiàng)資金研究生創(chuàng)新項(xiàng)目(No.2011YJS215)的支持,主要工作和創(chuàng)新點(diǎn)包括以下幾個(gè)方面: 1.研究微博用戶特征量的分布和用戶發(fā)布行為規(guī)律,建立用戶發(fā)布微博的行為模型。實(shí)證分析發(fā)現(xiàn)新浪微博用戶特征量具有不同冪律分布特征,且互相之間存在不同的相關(guān)性。發(fā)現(xiàn)用戶個(gè)體和群體發(fā)布微博的時(shí)間間隔均呈現(xiàn)冪律分布,冪律指數(shù)與用戶活躍程度成正比;用戶發(fā)布興趣受到其他用戶交互行為的影響,并有明顯的周期性;用戶發(fā)布行為具有自相似特征。本文分析了基于社交驅(qū)動(dòng)和興趣驅(qū)動(dòng)共同影響的微博用戶發(fā)布模型,提出了一種基于用戶興趣衰減服從Logistic函數(shù)的用戶發(fā)布模型,并使用該模型仿真驗(yàn)證了用戶發(fā)布微博的時(shí)間間隔分布特征。此研究有助于更深入地理解微博用戶的行為特征,為進(jìn)一步研究微博網(wǎng)絡(luò)結(jié)構(gòu)和信息傳播模式提供理論依據(jù)和形式參考。 2.研究微博用戶特征量分布的形成機(jī)制和增長規(guī)律。使用雙帕累托對(duì)數(shù)正態(tài)(DPLN)分布對(duì)用戶特征量分布進(jìn)行擬合,相比對(duì)數(shù)正態(tài)分布和冪律分布,可以得到更優(yōu)的效果,同時(shí)用戶活躍時(shí)間服從指數(shù)分布,不同活躍時(shí)間的用戶特征量都近似服從對(duì)數(shù)正態(tài)分布,用戶特征量的增長率服從對(duì)數(shù)正態(tài)分布,且與特征量自身的規(guī)模無關(guān),因此使用雙帕累托對(duì)數(shù)正態(tài)分布模型解釋了用戶特征量的雙段冪律形成機(jī)制;谙蛄坑嘞揖嚯x相似性的K-means聚類算法,提出一種分析微博用戶特征量增長模式的計(jì)算方法,并對(duì)不同排序和初始規(guī)模實(shí)際用戶特征量的時(shí)間序列數(shù)據(jù)進(jìn)行聚類分析;分析導(dǎo)致用戶粉絲數(shù)爆發(fā)式增長的原因,并發(fā)現(xiàn)微博用戶特征量和用戶數(shù)增長之間存在異速增長現(xiàn)象。 3.分析微博網(wǎng)絡(luò)節(jié)點(diǎn)中心性特征并提出用戶影響力度量方法。根據(jù)新浪微博實(shí)際用戶數(shù)據(jù),構(gòu)造了兩個(gè)基于雙向“關(guān)注”的用戶關(guān)系網(wǎng)絡(luò);通過分析網(wǎng)絡(luò)拓?fù)浣y(tǒng)計(jì)特征,發(fā)現(xiàn)上述兩個(gè)網(wǎng)絡(luò)都具有小世界和無標(biāo)度的特征;然后分別對(duì)兩個(gè)網(wǎng)絡(luò)的四種中心性指標(biāo)(節(jié)點(diǎn)度、緊密度、介數(shù)和k-Core)及其相關(guān)性進(jìn)行分析;在此基礎(chǔ)上,借助基于傳染病動(dòng)力學(xué)的SIR信息傳播模型,分別分析兩個(gè)網(wǎng)絡(luò)中具有不同中心性指標(biāo)的初始傳播節(jié)點(diǎn)對(duì)信息傳播速度和范圍的影響。結(jié)果表明,緊密度和k-Core較其他指標(biāo)可以更加準(zhǔn)確的描述節(jié)點(diǎn)在信息傳播中所處的網(wǎng)絡(luò)核心位置。進(jìn)一步的分析可知上述兩個(gè)指標(biāo)有助于識(shí)別信息傳播拓?fù)渚W(wǎng)絡(luò)中的關(guān)鍵節(jié)點(diǎn)。該方法可為微博營銷、用戶推薦、網(wǎng)絡(luò)輿情分析等領(lǐng)域的應(yīng)用提供理論支撐。 4.提出一種基于社團(tuán)和混合連接特征的網(wǎng)絡(luò)演化模型。通過對(duì)兩個(gè)微博用戶雙向關(guān)注網(wǎng)絡(luò)拓?fù)涮卣鞯倪M(jìn)一步分析,發(fā)現(xiàn)二者均為異配網(wǎng)絡(luò),具有分層性質(zhì)和社團(tuán)結(jié)構(gòu),其社團(tuán)規(guī)模呈指數(shù)分布。然后,根據(jù)微博用戶雙向關(guān)注數(shù)近似符合對(duì)數(shù)正態(tài)分布,以及真實(shí)微博雙向關(guān)注網(wǎng)絡(luò)的結(jié)構(gòu)特點(diǎn)及其生成機(jī)制,提出了一種基于社團(tuán)結(jié)構(gòu)和混合連接特征的網(wǎng)絡(luò)生成模型,該模型的混合連接機(jī)制包括:新增節(jié)點(diǎn)在社團(tuán)內(nèi)部分別采用服從對(duì)數(shù)正態(tài)分布適應(yīng)度的擇優(yōu)連接和隨機(jī)連接機(jī)制;已有節(jié)點(diǎn)在社團(tuán)內(nèi)擇優(yōu)選擇后分別采用近鄰互聯(lián)和全局互聯(lián)機(jī)制。仿真結(jié)果表明,該模型生成網(wǎng)絡(luò)的度分布、聚類系數(shù)、度相關(guān)性、最短路徑長度和社團(tuán)結(jié)構(gòu)等網(wǎng)絡(luò)性質(zhì)和特征參數(shù)能較好的符合實(shí)際網(wǎng)絡(luò),通過調(diào)節(jié)參數(shù)可以生成不同度分布和聚類系數(shù)的網(wǎng)絡(luò)。
[Abstract]:With the rapid development of Internet, especially the mobile Internet, micro-blog has become a very important form of online social network. In the micro-blog network, the way of user access is more convenient, more flexible, and more rapid and extensive information dissemination. The user behavior and network structure affect the transmission of information. In view of this, this paper uses the ideas and methods of cross discipline to study the user behavior characteristics and models in micro-blog, the formation mechanism and growth pattern of user characteristic quantity distribution, network centrality and information dissemination measurement, network topology features and evolution model, and try to find the micro-blog user line. For the evolution of pattern and network structure, a mathematical model that can depict these rules is established and the relevant strategies to predict user behavior are found. The work of this paper helps to understand the behavior characteristics of micro-blog users, deepen the understanding of the structure of micro-blog network and the relationship of information dissemination, and provide the theoretical research for complex networks and social networks. Some exploratory results.
The research work of the paper is supported by the National Natural Science Fund (No.6117207261271308), the Beijing Natural Science Foundation Project (No.4112045) and the special fund graduate innovation project (No.2011YJS215) of the basic scientific research services of the Central University. The main work and the new points include the following aspects:
1. study the distribution of user characteristics of micro-blog and the law of user release behavior and establish the behavior model of micro-blog. The empirical analysis shows that the user characteristics of sina micro-blog have different power law distribution characteristics, and there are different correlations between each other. It is found that the time interval between the user and the group of micro-blog has a power law distribution. The power law index is proportional to the activity degree of the user; the user's release interest is influenced by the interaction behavior of other users and has obvious periodicity. The user release behavior has the self similar characteristics. This paper analyzes the micro-blog user release model based on the common influence of social driven and interest driven, and puts forward a kind of attenuation based on the user interest attenuation. The user release model obeys the Logistic function, and uses this model to simulate and verify the time distribution characteristics of the user published micro-blog. This research helps to understand the behavior characteristics of micro-blog users more deeply and provide the theoretical basis and form reference for further research on the structure of micro-blog network and the mode of information dissemination.
2. study the formation mechanism and growth law of the distribution of micro-blog user characteristic quantity. Using the double Pareto logarithmic normal (DPLN) distribution to fit the distribution of user characteristics, compared with the logarithmic normal distribution and power law distribution, the better effect can be obtained. At the same time, the active time of the user obeys the exponential distribution, and the user characteristic of different active time is close. Similar to lognormal distribution, the growth rate of the user's characteristic quantity obeys the lognormal distribution and is independent of the scale of the characteristic quantity itself. So using the double Pareto log normal distribution model, the two segment power law formation mechanism of the user's characteristic is explained. Based on the K-means clustering algorithm of the vector cosine distance similarity, a kind of analysis micro-blog is proposed. The calculation method of the user characteristic quantity growth model and the clustering analysis of the time series data of different sort and initial scale actual user characteristic quantity, analyze the cause of the explosive growth of the number of user fans, and find that there is a fast growth phenomenon between the micro-blog user characteristic and the increase of the number of users.
3. analyze the centrality characteristics of micro-blog network node and propose the method of user influence measurement. According to the actual user data of sina micro-blog, two user relations networks based on two-way "concern" are constructed. By analyzing the statistical characteristics of the network topology, it is found that the above two networks have the characteristics of small world and scale-free, and then respectively to two. The four central indexes (node degree, tightness, mediate and k-Core) and their correlation are analyzed. On this basis, the influence of initial propagation nodes with different central indexes on the speed and range of information propagation in the two networks is analyzed by using the SIR information propagation model based on the dynamics of infectious diseases. Tightness and k-Core can be more accurate than other indicators to describe the network core location of nodes in information propagation. Further analysis can help to identify key nodes in the information propagation topology network. This method can be used in the application of micro-blog marketing, user recommendation, network public opinion analysis and other fields. On the support.
4. a network evolution model based on community and mixed connection features is proposed. Through further analysis of the network topology features of two micro-blog users, it is found that both of the two are heterogeneous networks with hierarchical and community structure, and their community scale is exponentially distributed. Then, according to the approximate logarithm of the two-way concern number of micro-blog users Normal distribution, as well as the structure characteristics and generation mechanism of real micro-blog two-way concern network, proposed a network generation model based on community structure and mixed connection features. The hybrid connection mechanism of the model includes the preferred connection and random connection of the new nodes in the community to obey the logarithmic normal distribution. The simulation results show that the model generates the degree distribution of the network, the clustering coefficient, the degree correlation, the shortest path length and the community structure, and the network properties and characteristic parameters can be better conformed to the actual network, and can be generated by adjusting the parameters. A network of different degree distribution and clustering coefficient.
【學(xué)位授予單位】:北京交通大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2014
【分類號(hào)】:TP393.092;F206
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 王林;張婧婧;;復(fù)雜網(wǎng)絡(luò)的中心化[J];復(fù)雜系統(tǒng)與復(fù)雜性科學(xué);2006年01期
2 胡海波;王科;徐玲;汪小帆;;基于復(fù)雜網(wǎng)絡(luò)理論的在線社會(huì)網(wǎng)絡(luò)分析[J];復(fù)雜系統(tǒng)與復(fù)雜性科學(xué);2008年02期
3 張晨逸;孫建伶;丁軼群;;基于MB-LDA模型的微博主題挖掘[J];計(jì)算機(jī)研究與發(fā)展;2011年10期
4 樊鵬翼;王暉;姜志宏;李沛;;微博網(wǎng)絡(luò)測量研究[J];計(jì)算機(jī)研究與發(fā)展;2012年04期
5 許曉東;肖銀濤;朱士瑞;;微博社區(qū)的謠言傳播仿真研究[J];計(jì)算機(jī)工程;2011年10期
6 傅雷揚(yáng);王汝傳;王海艷;任勛益;;R/S方法求解網(wǎng)絡(luò)流量自相似參數(shù)的實(shí)現(xiàn)與應(yīng)用[J];南京航空航天大學(xué)學(xué)報(bào);2007年03期
7 楊春霞;胡丹婷;胡森;;微博病毒傳播模型研究[J];計(jì)算機(jī)工程;2012年15期
8 王元卓;靳小龍;程學(xué)旗;;網(wǎng)絡(luò)大數(shù)據(jù):現(xiàn)狀與展望[J];計(jì)算機(jī)學(xué)報(bào);2013年06期
9 易成岐;鮑媛媛;薛一波;姜京池;;新浪微博的大規(guī)模信息傳播規(guī)律研究[J];計(jì)算機(jī)科學(xué)與探索;2013年06期
10 何靜;郭進(jìn)利;徐雪娟;;微博關(guān)系網(wǎng)絡(luò)模型研究[J];計(jì)算機(jī)工程;2013年11期
相關(guān)博士學(xué)位論文 前1條
1 殷瑞飛;數(shù)據(jù)挖掘中的聚類方法及其應(yīng)用[D];廈門大學(xué);2008年
,本文編號(hào):1943643
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/1943643.html