面向中文微博文本的情感識(shí)別與分類技術(shù)研究
發(fā)布時(shí)間:2018-04-21 02:07
本文選題:微博內(nèi)容分析 + 主觀句識(shí)別。 參考:《華中師范大學(xué)》2014年碩士論文
【摘要】:作為時(shí)下最流行的社交媒體之一,微博具有信息傳播快、信息量大、內(nèi)容欠規(guī)范等顯著特點(diǎn),已發(fā)展成為互聯(lián)網(wǎng)信息交流共享的重要平臺(tái)之一。當(dāng)前,對微博文本的情感識(shí)別與分類研究逐漸成為自然語言處理領(lǐng)域中一個(gè)新的熱點(diǎn)研究方向及難點(diǎn)所在,其研究成果對于企業(yè)及時(shí)洞悉用戶對產(chǎn)品或服務(wù)的使用反饋、對獲取社會(huì)民眾的民意、輿情監(jiān)測等應(yīng)用均具有重要的現(xiàn)實(shí)意義。 本研究致力于初步解決面向中文微博文本的主觀句識(shí)別與情感分類問題,具體研究內(nèi)容如下: 一、通過分析微博文本,總結(jié)出微博文本的若干結(jié)構(gòu)特征,并構(gòu)建表情情感庫。在分析微博文本中常出現(xiàn)的重復(fù)標(biāo)點(diǎn)符號(hào)的基礎(chǔ)上,整理出輔助識(shí)別情感分類的標(biāo)點(diǎn)符號(hào)情感庫。將情感詞匯本體庫和表情情感庫、標(biāo)點(diǎn)符號(hào)情感庫相結(jié)合,構(gòu)建了中文微博文本的情感特征庫。 二、分別使用詞頻統(tǒng)計(jì)、期望交叉熵、TF-IDF、以及求TF-IDF的方差等方法對微博文本進(jìn)行情感特征抽取,其實(shí)驗(yàn)結(jié)果顯示:基于方差與TF-IDF加權(quán)結(jié)合的特征識(shí)別與抽取方法取得了最好的結(jié)果。 三、關(guān)于微博文本的情感識(shí)別與分類,我們首先判斷微博文本的主、客觀性,使用樸素貝葉斯方法和支持向量機(jī)方法來識(shí)別主觀句,實(shí)驗(yàn)結(jié)果顯示:樸素貝葉斯方法對主觀句的識(shí)別效果更好。此后,對屬于主觀句的微博文本我們進(jìn)行了情感分類研究,使用基于支持向量機(jī)的一對一分類法和一對其余分類法,其實(shí)驗(yàn)結(jié)果顯示:基于支持向量機(jī)的一對一分類法效果更好。 四、基于以上提出的情感特征抽取方法及情感識(shí)別與分類方法,我們構(gòu)建了相應(yīng)的原型系統(tǒng)。通過在公開評測數(shù)據(jù)集上的一系列實(shí)驗(yàn)驗(yàn)證了本文所提方法的可行性和有效性。
[Abstract]:As one of the most popular social media, micro-blog has become one of the most important platforms for the communication and sharing of Internet information, such as fast information transmission, large amount of information and lack of standard content. At present, the research of emotion recognition and classification of micro-blog text has gradually become a new hot research area in the field of Natural Language Processing. The research results are of great practical significance for the enterprise to understand the users' feedback on the use of products or services in time, and to obtain public opinion and public opinion monitoring.
This study aims to solve the problem of subjective sentence recognition and sentiment classification in Chinese micro-blog texts.
First, through the analysis of micro-blog text, it summarizes some structural features of micro-blog text, and constructs emotional expression library. On the basis of analyzing the repeated punctuation symbols which often appear in micro-blog text, it collate the emotional Library of punctuation symbols to identify the emotional classification, and combine the emotional vocabulary library with the expression emotion library and the punctuation symbol emotional library. The emotional feature library of Chinese micro-blog text is built.
Two, using the word frequency statistics, the expectation cross entropy, TF-IDF, and the variance of TF-IDF to extract the emotional feature of micro-blog text. The experimental results show that the best result is obtained by the method of feature recognition and extraction based on the combination of variance and TF-IDF weighting.
Three, on the emotion recognition and classification of micro-blog text, we first judge the subjective and objectivity of the micro-blog text, using the simple Bias method and the support vector machine method to identify the subjective sentence. The experimental results show that the simple Bias method has better recognition effect on the subjective sentence. After that, we have done the feeling to the micro-blog text which belongs to the subjective sentence. A one to one classification method based on support vector machines and a pair of other classification methods are used. The experimental results show that the one to one classification method based on support vector machines has a better effect.
Four, based on the above proposed method of emotional feature extraction and the method of emotion recognition and classification, we construct a corresponding prototype system. The feasibility and effectiveness of the proposed method are verified by a series of experiments on the public evaluation data set.
【學(xué)位授予單位】:華中師范大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號(hào)】:TP391.1;TP393.092
【參考文獻(xiàn)】
相關(guān)期刊論文 前4條
1 賀飛艷;何炎祥;劉楠;劉健博;彭敏;;面向微博短文本的細(xì)粒度情感特征抽取方法[J];北京大學(xué)學(xué)報(bào)(自然科學(xué)版);2014年01期
2 歐陽純萍;陽小華;雷龍艷;徐強(qiáng);余穎;劉志明;;多策略中文微博細(xì)粒度情緒分析研究[J];北京大學(xué)學(xué)報(bào)(自然科學(xué)版);2014年01期
3 胡燕;吳虎子;鐘珞;;基于改進(jìn)的kNN算法的中文網(wǎng)頁自動(dòng)分類方法研究[J];武漢大學(xué)學(xué)報(bào)(工學(xué)版);2007年04期
4 侯敏;滕永林;李雪燕;陳毓麒;鄭雙美;侯明午;周紅照;;話題型微博語言特點(diǎn)及其情感分析策略研究[J];語言文字應(yīng)用;2013年02期
相關(guān)博士學(xué)位論文 前4條
1 蔣良孝;樸素貝葉斯分類器及其改進(jìn)算法研究[D];中國地質(zhì)大學(xué);2009年
2 施寒瀟;細(xì)粒度情感分析研究[D];蘇州大學(xué);2013年
3 廖一星;文本分類及其特征降維研究[D];浙江大學(xué);2012年
4 劉楠;面向微博短文本的情感分析研究[D];武漢大學(xué);2013年
,本文編號(hào):1780447
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/1780447.html
最近更新
教材專著