面向中文微博的情感信息抽取方法研究
發(fā)布時間:2018-03-10 15:27
本文選題:中文微博 切入點:情感信息抽取 出處:《北京信息科技大學》2015年碩士論文 論文類型:學位論文
【摘要】:隨著互聯(lián)網的廣泛普及,網絡已經成為人們獲取信息、共享信息的主要途徑。微博作為一種新興的互動交流平臺,也逐漸成為人們網絡生活的一部分,面向微博文本的相關研究受到越來越多學者的關注。面向微博的情感分析是微博相關文本分析中的一個重要的課題,而中文微博的情感信息抽取作為中文微博情感分析的基礎任務,受到研究者的廣泛關注,逐漸成為一個熱門的研究方向。 中文微博的情感信息抽取的目的在于將無結構的情感文本轉換成有結構的文本——情感信息單元,不但可以直接應用于用戶評論分析與決策等方面,而且可以服務于其它情感分析任務,如文本情感分類。其中情感信息單元包括評價對象、評價詞語、極性及觀點持有者四個元素。然而,由于微博文本語言表達隨意,大多數(shù)微博文本的句法結構都是不完整的,且具有大量的冗余信息和網絡詞匯,采用原有文本意見挖掘方法進行抽取信息的效果并不理想。因此需要結合微博自身特點,對現(xiàn)有技術進行改進以便抽取微博情感信息,主要研究內容包括以下幾個方面: (1)中文微博評價對象候選集的構建。結合中文微博文本的特點,對微博文本進行預處理,利用句法分析獲取名詞短語,對名詞短語進行后處理,再構建包括名詞、名詞短語以及微博話題在內的評價對象候選集,并對該步驟的實驗結果進行分析。 (2)中文微博候選評價對象的篩選。采用3種策略實現(xiàn)候選評價對象的篩選:首先,采用SVM模型篩選候選評價對象,通過采用語義角色信息、最小距離和詞頻三個特征,實現(xiàn)SVM模型分類器對候選評價對象進行篩選;其次,采用加權模型篩選候選評價對象,根據不同特征,計算候選評價對象的權重分數(shù),從而判別其是否為正確的評價對象。最后,基于CRF模型善于解決序列標注問題的特點,引入常用的情感信息抽取特征,以及情感詞、語義角色標注等特征,采用CRF模型對候選評價對象進行篩選。 (3)評價對象的極性判別。若評價對象附近存在情感詞,則尋找距離評價對象最近的情感詞,根據情感詞表,判斷評價對象的情感極性;若評價對象附近不存在情感詞,則用微博句子的情感極性代替評價對象的情感極性,其中微博句子的情感極性通過樸素貝葉斯分類器得到。 (4)綜上研究內容,設計并實現(xiàn)了中文微博情感信息抽取系統(tǒng)。該系統(tǒng)可用于對評價對象候選集的構建方法、候選評價對象的篩選方法以及極性判別方法進行實驗結果分析,,也可實際用于情感信息的抽取任務。
[Abstract]:With the wide popularity of the Internet, the Internet has become the main way for people to obtain and share information. Weibo, as a new interactive communication platform, has gradually become a part of people's network life. The research on Weibo's text has attracted more and more scholars' attention. The affective analysis for Weibo is an important topic in the analysis of the relevant texts of Weibo. Chinese Weibo's emotional information extraction as the basic task of Chinese Weibo emotional analysis has been widely concerned by researchers and has gradually become a hot research direction. The purpose of Weibo's emotional information extraction is to transform the unstructured emotional text into a structured text-emotional information unit, which can be directly applied to the analysis and decision making of user comments and so on. And it can serve other affective analysis tasks, such as text affective classification. The emotional information unit includes four elements: evaluation object, appraising words, polarity and viewpoint holder. However, because Weibo's text language expresses freely, The syntactic structure of most Weibo texts is incomplete and has a lot of redundant information and network vocabulary. The effect of extracting information by using the original text opinion mining method is not ideal. To improve the existing technology to extract Weibo emotional information, the main content of the study includes the following aspects:. 1) Construction of candidate set for evaluating object of Chinese Weibo. According to the characteristics of Chinese Weibo text, this paper preprocesses the Weibo text, acquires noun phrases by syntactic analysis, post-processes noun phrases, and constructs nouns. Noun phrase and Weibo topic are evaluated candidate sets, and the experimental results of this step are analyzed. (2) the selection of candidate evaluation objects for Chinese Weibo. Three strategies are adopted to select candidate evaluation objects. Firstly, SVM model is used to screen candidate evaluation objects, and semantic role information, minimum distance and word frequency are used to select candidate evaluation objects. SVM model classifier is used to filter candidate evaluation objects. Secondly, weighted model is used to filter candidate evaluation objects. According to different characteristics, the weight fraction of candidate evaluation objects is calculated. Finally, based on the CRF model, which is good at solving the problem of sequence tagging, the commonly used features of emotional information extraction, affective words, semantic role tagging and so on are introduced. CRF model was used to screen candidate evaluation objects. If there are affective words near the evaluation object, then the nearest affective word is found, according to the emotional lexicon, the emotional polarity of the evaluated object is judged; if there is no affective word in the vicinity of the evaluation object, the emotional polarity of the evaluated object is judged according to the emotional lexicon. The emotion polarity of Weibo sentence is replaced by the emotion polarity of evaluation object, and the affective polarity of Weibo sentence is obtained by naive Bayes classifier. In this paper, a Chinese Weibo emotional information extraction system is designed and implemented, which can be used to analyze the experimental results of candidate set construction, candidate selection method and polarity discrimination method. It can also be used to extract emotional information.
【學位授予單位】:北京信息科技大學
【學位級別】:碩士
【學位授予年份】:2015
【分類號】:TP391.1;TP393.092
【參考文獻】
相關期刊論文 前1條
1 樊娜;蔡皖東;趙煜;;基于最大熵模型的觀點句主觀關系提取[J];計算機工程;2010年02期
相關碩士學位論文 前2條
1 杜振雷;面向微博短文本的情感分析研究[D];北京信息科技大學;2013年
2 戴敏;中文評價對象抽取中省略現(xiàn)象研究[D];蘇州大學;2014年
本文編號:1593937
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/1593937.html
最近更新
教材專著