基于主題模型和社區(qū)發(fā)現(xiàn)的微博熱點(diǎn)事件檢測研究
發(fā)布時間:2018-07-17 03:56
【摘要】:憑借簡便快捷的信息生成機(jī)制和傳播機(jī)制,微博這一新興的社交網(wǎng)絡(luò)服務(wù)媒體在Web2.0時代已無處不在。與傳統(tǒng)媒體相比,在新聞事件的播報(bào)和傳播上,微博更加及時高效。因而基于微博數(shù)據(jù)的熱點(diǎn)事件檢測成為近幾年的一個研究熱點(diǎn)。但微博的一些特性為微博熱點(diǎn)事件檢測任務(wù)帶來挑戰(zhàn)。首先,微博數(shù)據(jù)流中有大量無價值、無意義的“噪聲”微博,如何有效的從微博數(shù)據(jù)流中將令人感興趣的事件微博與大量“噪聲”微博區(qū)分開是微博熱點(diǎn)事件檢測面對的首要挑戰(zhàn)。其次,一條微博只有不超過140個字符,文本極其稀疏,且常常包含拼寫和語法錯誤、混合語言文字等,這些都使傳統(tǒng)的文本分析技術(shù)無法直接應(yīng)用于微博事件檢測。 本文首先研究了國內(nèi)外現(xiàn)有微博熱點(diǎn)事件檢測相關(guān)技術(shù),然后根據(jù)現(xiàn)有技術(shù)不足,在靜態(tài)和動態(tài)兩種類型的微博熱點(diǎn)事件檢測上進(jìn)行了相關(guān)研究和擴(kuò)展。在靜態(tài)微博事件檢測方面,本文提出一種基于主題模型和貝葉斯方法的文本分類方法在靜態(tài)微博數(shù)據(jù)上檢測事件微博,該方法將靜態(tài)微博數(shù)據(jù)映射到主題空間表述,并挖掘主題與文本類型之間的關(guān)系,然后根據(jù)微博的主題類別屬性是否為事件類判斷該微博的類別屬性。在動態(tài)事件檢測方面,本文提出一種基于社區(qū)發(fā)現(xiàn)和圖核計(jì)算的動態(tài)事件檢測方法,該方法首先根據(jù)本文提出的一種動態(tài)事件詞選取算法選取事件詞;然后分時間片將動態(tài)實(shí)時微博數(shù)據(jù)流中的微博根據(jù)其所含事件詞狀態(tài)構(gòu)建成微博語義圖,每個時間片的微博語義圖以微博博文為結(jié)點(diǎn),以結(jié)點(diǎn)之間是否出現(xiàn)相同事件詞為邊,然后使用一種社區(qū)發(fā)現(xiàn)算法發(fā)現(xiàn)每個時間片微博語義圖中的事件社區(qū),并返回每個事件社區(qū)的關(guān)鍵結(jié)點(diǎn)微博作為該事件社區(qū)所反映事件的描述;本文還提出一種基于主題語義的編碼方案為事件社區(qū)圖中每個結(jié)點(diǎn)編制一個比特?cái)?shù)組編碼標(biāo)簽,得到新的帶標(biāo)簽的事件社區(qū)圖,最后應(yīng)用一種圖核算法,計(jì)算在相鄰時間片的標(biāo)簽事件社區(qū)圖的相似度,并根據(jù)計(jì)算結(jié)果匹配描述同一事件的事件社區(qū),達(dá)到事件追蹤的目的。本文以實(shí)時爬取的中文微博數(shù)據(jù)為實(shí)驗(yàn)數(shù)據(jù),分別應(yīng)用上述兩種方法檢測微博熱點(diǎn)事件,實(shí)驗(yàn)結(jié)果表明,上述兩種方法均能達(dá)到預(yù)期效果。
[Abstract]:With the convenient and fast mechanism of information generation and dissemination, Weibo, a new social network service media, has become ubiquitous in the era of Web 2.0. Compared with traditional media, Weibo is more timely and efficient in the broadcast and dissemination of news events. Therefore, hot spot event detection based on Weibo data has become a research hotspot in recent years. However, some features of Weibo bring challenges to Weibo hotspot event detection task. First, there are a lot of worthless, meaningless "noisy" Weibo in the Weibo data stream. How to effectively distinguish the interesting event Weibo from a large number of "noise" Weibo from the Weibo data stream is the primary challenge of Weibo hot event detection. Secondly, a Weibo has no more than 140 characters, the text is extremely sparse, and often contains spelling and grammar errors, mixed languages and so on, which make the traditional text analysis technology can not be directly applied to Weibo event detection. This paper first studies the existing Weibo hot spot event detection technologies at home and abroad, and then, according to the lack of the existing technology, we research and extend the static and dynamic Weibo hot spot event detection. In the aspect of static Weibo event detection, a text classification method based on topic model and Bayesian method is proposed to detect event Weibo on static Weibo data. This method maps static Weibo data to topic space representation. The relationship between the topic and the text type is mined, and then the category attribute of the Weibo is judged according to whether the subject category attribute of the Weibo is the event class. In the aspect of dynamic event detection, this paper proposes a dynamic event detection method based on community discovery and graph kernel computing. Firstly, this method selects event words according to a dynamic event word selection algorithm proposed in this paper. Then, the Weibo in the dynamic real-time Weibo data stream is constructed into a Weibo semantic map according to the status of the event words in the dynamic real-time Weibo data stream. The Weibo semantic map of each time slice is based on the Weibo blog as the node and the same event word as the edge between the nodes. Then a community discovery algorithm is used to find the event community in the Weibo semantic graph of each time slice and return the key node of each event community Weibo as the description of the event community reflected by the event community. In this paper, we also propose a coding scheme based on topic semantics to compile a bit-array coding tag for each node in the event community graph, and obtain a new tagged event community map. Finally, a graph accounting method is applied. The similarity of the tagged event community graph in the adjacent time slice is calculated, and the event community describing the same event is matched according to the calculated results to achieve the purpose of event tracking. In this paper, the Chinese Weibo data collected in real time are used as experimental data, and the two methods mentioned above are used to detect the hot spot events of Weibo. The experimental results show that the two methods can achieve the desired results.
【學(xué)位授予單位】:西南大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TP393.092;TP391.1
本文編號:2128903
[Abstract]:With the convenient and fast mechanism of information generation and dissemination, Weibo, a new social network service media, has become ubiquitous in the era of Web 2.0. Compared with traditional media, Weibo is more timely and efficient in the broadcast and dissemination of news events. Therefore, hot spot event detection based on Weibo data has become a research hotspot in recent years. However, some features of Weibo bring challenges to Weibo hotspot event detection task. First, there are a lot of worthless, meaningless "noisy" Weibo in the Weibo data stream. How to effectively distinguish the interesting event Weibo from a large number of "noise" Weibo from the Weibo data stream is the primary challenge of Weibo hot event detection. Secondly, a Weibo has no more than 140 characters, the text is extremely sparse, and often contains spelling and grammar errors, mixed languages and so on, which make the traditional text analysis technology can not be directly applied to Weibo event detection. This paper first studies the existing Weibo hot spot event detection technologies at home and abroad, and then, according to the lack of the existing technology, we research and extend the static and dynamic Weibo hot spot event detection. In the aspect of static Weibo event detection, a text classification method based on topic model and Bayesian method is proposed to detect event Weibo on static Weibo data. This method maps static Weibo data to topic space representation. The relationship between the topic and the text type is mined, and then the category attribute of the Weibo is judged according to whether the subject category attribute of the Weibo is the event class. In the aspect of dynamic event detection, this paper proposes a dynamic event detection method based on community discovery and graph kernel computing. Firstly, this method selects event words according to a dynamic event word selection algorithm proposed in this paper. Then, the Weibo in the dynamic real-time Weibo data stream is constructed into a Weibo semantic map according to the status of the event words in the dynamic real-time Weibo data stream. The Weibo semantic map of each time slice is based on the Weibo blog as the node and the same event word as the edge between the nodes. Then a community discovery algorithm is used to find the event community in the Weibo semantic graph of each time slice and return the key node of each event community Weibo as the description of the event community reflected by the event community. In this paper, we also propose a coding scheme based on topic semantics to compile a bit-array coding tag for each node in the event community graph, and obtain a new tagged event community map. Finally, a graph accounting method is applied. The similarity of the tagged event community graph in the adjacent time slice is calculated, and the event community describing the same event is matched according to the calculated results to achieve the purpose of event tracking. In this paper, the Chinese Weibo data collected in real time are used as experimental data, and the two methods mentioned above are used to detect the hot spot events of Weibo. The experimental results show that the two methods can achieve the desired results.
【學(xué)位授予單位】:西南大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TP393.092;TP391.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前1條
1 路榮;項(xiàng)亮;劉明榮;楊青;;基于隱主題分析和文本聚類的微博客中新聞話題的發(fā)現(xiàn)[J];模式識別與人工智能;2012年03期
,本文編號:2128903
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2128903.html
最近更新
教材專著