天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于文本內(nèi)容的微博突發(fā)話題檢測技術(shù)研究

發(fā)布時間:2018-06-23 15:44

  本文選題:微博 + 突發(fā)。 參考:《杭州電子科技大學(xué)》2014年碩士論文


【摘要】:微博的開放性與便捷性,使得微博己經(jīng)成為了網(wǎng)絡(luò)輿論傳播的一個重要平臺。但是微博信息量大,傳播速度快,這給網(wǎng)絡(luò)輿情的收集和管理工作帶來了挑戰(zhàn)。因此,如何從微博信息流中及時準確地檢測出突發(fā)話題是當(dāng)前研究中的一個難點和熱點問題。本文對微博突發(fā)話題偵測中的兩個關(guān)鍵技術(shù):突現(xiàn)主題詞和觀點詞的檢測方法展開了研究。其主要工作包括如下三個方面。 首先為了提高偵測話題的準確率和召回率,提出了一種基于內(nèi)容搜索的突現(xiàn)主題詞檢測方法。以暴發(fā)性關(guān)鍵詞為線索,借助Lucene檢索工具把與暴發(fā)性關(guān)鍵詞相關(guān)的微博文本合并形成一個文本文檔,然后結(jié)合傳統(tǒng)的TF-IDF方法摘取文檔中的主題詞。實驗表明,當(dāng)檢測到的主題詞達到八個甚至十個時,準確率和召回率的權(quán)衡值F-measure分別為0.87和0.84,其平均F-measure值比基于關(guān)聯(lián)規(guī)則的方法提高了13.2%。 其次,為了更準確地檢測出話題中表達的主要觀點,提出了一種基于互信息的觀點詞檢測方法。以大連理工大學(xué)的情感詞典為基礎(chǔ),訓(xùn)練情感詞典,用改進的互信息方法計算主題詞與情感詞之間的關(guān)聯(lián)程度,并以此來找到與主題詞最相關(guān)的觀點詞。對比實驗表明,以互信息理論為基礎(chǔ)來計算主題詞與觀點詞之間的關(guān)聯(lián)程度,可以更準確的檢測出話題中表達的主要觀點,觀點詞檢測的準確率和召回率分別為0.72和0.65,其綜合評估指標F-measure的值為0.68,比傳統(tǒng)的方法提高了約5%。 最后在上述提出兩種方法的基礎(chǔ)之上,實現(xiàn)了一個可在線檢測微博突發(fā)話題的系統(tǒng)。系統(tǒng)一方面采用了文章中提出的突現(xiàn)主題詞檢測方法和觀點詞檢測方法,實現(xiàn)了突發(fā)話題的檢測功能,驗證了方法的有效性;另一方面實現(xiàn)了微博內(nèi)容定位和微博內(nèi)容搜索功能,使用戶能夠定位到與突發(fā)話題相關(guān)的具體微博。 本文以微博文本內(nèi)容為研究對象,提出了基于內(nèi)容搜索的突現(xiàn)主題詞檢測方法和基于互信息的觀點詞檢測方法,并且在這兩種方法的基礎(chǔ)上實現(xiàn)了一個在線的微博突發(fā)話題檢測系統(tǒng)。本文的研究成果將有助于輿情監(jiān)察用戶更全面更直觀的掌握最新的網(wǎng)絡(luò)輿情,為微博的輿情監(jiān)察工作帶來了便利。
[Abstract]:With the openness and convenience of Weibo, Weibo has become an important platform for the dissemination of public opinion. However, Weibo has a large amount of information and high speed of dissemination, which brings challenges to the collection and management of network public opinion. Therefore, how to detect burst topic from Weibo information flow in time and accurately is a difficult and hot issue in current research. In this paper, two key technologies in Weibo burst topic detection, namely, the detection method of emergent theme words and opinion words, are studied. Its main work includes the following three aspects. Firstly, in order to improve the accuracy and recall rate of detecting topic, a method based on content search is proposed to detect the pop-up subject words. With the help of the Lucene retrieval tool, the Weibo text related to the fulminant keyword is combined to form a text document, and then the theme words in the document are extracted with the traditional TF-IDF method. The experimental results show that when the detected subject words reach to eight or even ten, the trade-off values of accuracy and recall are 0.87 and 0.84, respectively. The average F-measure value is 13.2g higher than that of the method based on association rules. Secondly, in order to detect the main views expressed in the topic more accurately, a method of viewpoint word detection based on mutual information is proposed. Based on the emotion dictionary of Dalian University of Technology, this paper trains the emotion dictionary, calculates the correlation degree between the subject word and the emotion word by using the improved mutual information method, and finds the most relevant opinion words. The comparative experiments show that, based on the mutual information theory to calculate the correlation between theme words and opinion words, we can more accurately detect the main views expressed in the topic. The accuracy and recall rate of opinion word detection are 0.72 and 0.65, respectively. The F-measure, a comprehensive evaluation index, is 0.68, which is about 5 times higher than the traditional method. Finally, on the basis of the two methods mentioned above, a system for detecting Weibo burst topics on line is implemented. On the one hand, the system adopts the detection method of emergent theme words and viewpoint words, which realizes the detection function of burst topic, and verifies the validity of the method. On the other hand, the functions of Weibo content location and Weibo content search are implemented, which enables users to locate specific Weibo related to burst topics. In this paper, Weibo text content is taken as the research object, and a method of detecting emergent theme words based on content search and a method of detecting viewpoint words based on mutual information are proposed. On the basis of these two methods, an online Weibo burst topic detection system is implemented. The research results of this paper will help the users to master the latest network public opinion more comprehensively and intuitively, and bring convenience to the public opinion monitoring work of Weibo.
【學(xué)位授予單位】:杭州電子科技大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TP393.092;TP391.1

【參考文獻】

相關(guān)期刊論文 前10條

1 葉璐;;微博中的負面情緒傳播分析[J];今傳媒;2012年02期

2 楊德賀;陳宜金;楊溪;呂京國;張子昕;張帥;;面向震害信息提取的多源遙感圖像自動配準[J];國土資源遙感;2013年03期

3 周立柱;賀宇凱;王建勇;;情感分析研究綜述[J];計算機應(yīng)用;2008年11期

4 姜勝洪;;微博時代突發(fā)事件網(wǎng)絡(luò)輿情研究[J];理論與現(xiàn)代化;2012年03期

5 楊亮;林原;林鴻飛;;基于情感分布的微博熱點事件發(fā)現(xiàn)[J];中文信息學(xué)報;2012年01期

6 文坤梅;徐帥;李瑞軒;辜希武;李玉華;;微博及中文微博信息處理研究綜述[J];中文信息學(xué)報;2012年06期

7 趙妍妍;秦兵;劉挺;;文本情感分析[J];軟件學(xué)報;2010年08期

8 曾潤喜;;網(wǎng)絡(luò)輿情管控工作機制研究[J];圖書情報工作;2009年18期

9 張見威;韓國強;沃焱;;基于邊界距離場互信息的圖像配準方法[J];通信學(xué)報;2006年07期

10 王皓;孫宏斌;張伯明;郭慶來;;基于混合互信息的特征選擇方法及其在靜態(tài)電壓穩(wěn)定評估中的應(yīng)用[J];中國電機工程學(xué)報;2006年07期

,

本文編號:2057630

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2057630.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶70260***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com