新浪微博中熱點檢測子網(wǎng)的選點策略
發(fā)布時間:2018-10-26 08:18
【摘要】:微博已經(jīng)成為了信息交流和傳播的流行手段,大量的社會事件都會在微博中傳播,,檢測微博中的熱點事件也變得越來越重要。然而,微博熱點事件檢測面臨著一些巨大的挑戰(zhàn)。微博的用戶數(shù)量龐大且用戶相對比較活躍,使得微博中短時間內(nèi)就可以產(chǎn)生大量的微博。對這些數(shù)量龐大的微博進行處理需要大量的計算能力。 該文在中國最大的微博服務(wù)商新浪微博平臺上實現(xiàn)了一個熱點檢測系統(tǒng)。由于實時處理一段時間內(nèi)產(chǎn)生的所有微博來檢測熱點事件在經(jīng)濟上是不可行的,文章采取了一種策略,即通過監(jiān)控新浪微博中的一小部分微博用戶的微博,實現(xiàn)在有限的資源下對熱點事件進行檢測。文章的主要研究目的是為通過監(jiān)控子網(wǎng)節(jié)點實現(xiàn)熱點事件檢測的系統(tǒng)提供子網(wǎng)節(jié)點的選點算法。該文首先提出了熱點事件的覆蓋度的概念,并提出了一種針對覆蓋所有樣本熱點事件的子網(wǎng)選點算法。通過對該算法的研究,針對其不足,該文又提出了節(jié)點的熱點事件參與概率的概念,并據(jù)此提出了一種概率算法選擇子網(wǎng)節(jié)點。考慮到監(jiān)控子網(wǎng)節(jié)點微博的開銷的差別,該文最后提出了節(jié)點開銷的概念,并結(jié)合節(jié)點的熱點事件參與概率,提出了一種最優(yōu)化算法。該文一共收集了525個熱點事件,其中294個熱點事件作為訓練集,231個熱點事件作為測試集,并將提出的三種子網(wǎng)選點算法分別應用于該數(shù)據(jù)集。研究結(jié)果表明,相比于其它算法,最優(yōu)化算法能夠以更小的系統(tǒng)開銷,檢測到更多的熱點事件,熱點事件檢測率為70%。
[Abstract]:Weibo has become a popular means of information exchange and communication, a large number of social events will spread in Weibo, the detection of hot events in Weibo has become more and more important. However, Weibo hot spot event detection is facing some huge challenges. Weibo has a large number of users and relatively active users, which can produce a large number of Weibo in a short period of time. Dealing with these large numbers of Weibo requires a lot of computing power. This paper implements a hot spot detection system on the platform of China's largest Weibo service provider Weibo. Since it is not economically feasible for Weibo to detect hot spot events in real time processing, the article has adopted a strategy, that is, by monitoring a small number of Weibo users in Sina. The detection of hot events is realized with limited resources. The main purpose of this paper is to provide a subnet node selection algorithm for the system that monitors the subnet nodes to realize the hot event detection. In this paper, the concept of coverage of hot spot events is proposed, and a subnet algorithm is proposed for covering all hot events in samples. Based on the research of the algorithm and its deficiency, this paper puts forward the concept of the participation probability of hot spot events of nodes, and then proposes a probability algorithm to select the nodes in subnets. Considering the difference of the overhead of monitoring subnet node Weibo, this paper proposes the concept of node overhead and proposes an optimization algorithm based on the participation probability of hot spot events. In this paper, a total of 525 hot spot events are collected, of which 294 are as training sets and 231 as test sets. The proposed algorithm is applied to the data set. The results show that compared with other algorithms, the optimization algorithm can detect more hot events with less system overhead, and the detection rate of hot spot events is 70%.
【學位授予單位】:上海交通大學
【學位級別】:碩士
【學位授予年份】:2014
【分類號】:TP393.092
本文編號:2295127
[Abstract]:Weibo has become a popular means of information exchange and communication, a large number of social events will spread in Weibo, the detection of hot events in Weibo has become more and more important. However, Weibo hot spot event detection is facing some huge challenges. Weibo has a large number of users and relatively active users, which can produce a large number of Weibo in a short period of time. Dealing with these large numbers of Weibo requires a lot of computing power. This paper implements a hot spot detection system on the platform of China's largest Weibo service provider Weibo. Since it is not economically feasible for Weibo to detect hot spot events in real time processing, the article has adopted a strategy, that is, by monitoring a small number of Weibo users in Sina. The detection of hot events is realized with limited resources. The main purpose of this paper is to provide a subnet node selection algorithm for the system that monitors the subnet nodes to realize the hot event detection. In this paper, the concept of coverage of hot spot events is proposed, and a subnet algorithm is proposed for covering all hot events in samples. Based on the research of the algorithm and its deficiency, this paper puts forward the concept of the participation probability of hot spot events of nodes, and then proposes a probability algorithm to select the nodes in subnets. Considering the difference of the overhead of monitoring subnet node Weibo, this paper proposes the concept of node overhead and proposes an optimization algorithm based on the participation probability of hot spot events. In this paper, a total of 525 hot spot events are collected, of which 294 are as training sets and 231 as test sets. The proposed algorithm is applied to the data set. The results show that compared with other algorithms, the optimization algorithm can detect more hot events with less system overhead, and the detection rate of hot spot events is 70%.
【學位授予單位】:上海交通大學
【學位級別】:碩士
【學位授予年份】:2014
【分類號】:TP393.092
【參考文獻】
相關(guān)期刊論文 前1條
1 林小燕;;微博客流行的學理思考[J];新聞愛好者;2010年22期
本文編號:2295127
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2295127.html
最近更新
教材專著