面向中文微博文本的情感識(shí)別與分類(lèi)技術(shù)研究
本文選題:微博內(nèi)容分析 + 主觀(guān)句識(shí)別。 參考:《華中師范大學(xué)》2014年碩士論文
【摘要】:作為時(shí)下最流行的社交媒體之一,微博具有信息傳播快、信息量大、內(nèi)容欠規(guī)范等顯著特點(diǎn),已發(fā)展成為互聯(lián)網(wǎng)信息交流共享的重要平臺(tái)之一。當(dāng)前,對(duì)微博文本的情感識(shí)別與分類(lèi)研究逐漸成為自然語(yǔ)言處理領(lǐng)域中一個(gè)新的熱點(diǎn)研究方向及難點(diǎn)所在,其研究成果對(duì)于企業(yè)及時(shí)洞悉用戶(hù)對(duì)產(chǎn)品或服務(wù)的使用反饋、對(duì)獲取社會(huì)民眾的民意、輿情監(jiān)測(cè)等應(yīng)用均具有重要的現(xiàn)實(shí)意義。 本研究致力于初步解決面向中文微博文本的主觀(guān)句識(shí)別與情感分類(lèi)問(wèn)題,具體研究?jī)?nèi)容如下: 一、通過(guò)分析微博文本,總結(jié)出微博文本的若干結(jié)構(gòu)特征,并構(gòu)建表情情感庫(kù)。在分析微博文本中常出現(xiàn)的重復(fù)標(biāo)點(diǎn)符號(hào)的基礎(chǔ)上,整理出輔助識(shí)別情感分類(lèi)的標(biāo)點(diǎn)符號(hào)情感庫(kù)。將情感詞匯本體庫(kù)和表情情感庫(kù)、標(biāo)點(diǎn)符號(hào)情感庫(kù)相結(jié)合,構(gòu)建了中文微博文本的情感特征庫(kù)。 二、分別使用詞頻統(tǒng)計(jì)、期望交叉熵、TF-IDF、以及求TF-IDF的方差等方法對(duì)微博文本進(jìn)行情感特征抽取,其實(shí)驗(yàn)結(jié)果顯示:基于方差與TF-IDF加權(quán)結(jié)合的特征識(shí)別與抽取方法取得了最好的結(jié)果。 三、關(guān)于微博文本的情感識(shí)別與分類(lèi),我們首先判斷微博文本的主、客觀(guān)性,使用樸素貝葉斯方法和支持向量機(jī)方法來(lái)識(shí)別主觀(guān)句,實(shí)驗(yàn)結(jié)果顯示:樸素貝葉斯方法對(duì)主觀(guān)句的識(shí)別效果更好。此后,對(duì)屬于主觀(guān)句的微博文本我們進(jìn)行了情感分類(lèi)研究,使用基于支持向量機(jī)的一對(duì)一分類(lèi)法和一對(duì)其余分類(lèi)法,其實(shí)驗(yàn)結(jié)果顯示:基于支持向量機(jī)的一對(duì)一分類(lèi)法效果更好。 四、基于以上提出的情感特征抽取方法及情感識(shí)別與分類(lèi)方法,我們構(gòu)建了相應(yīng)的原型系統(tǒng)。通過(guò)在公開(kāi)評(píng)測(cè)數(shù)據(jù)集上的一系列實(shí)驗(yàn)驗(yàn)證了本文所提方法的可行性和有效性。
[Abstract]:As one of the most popular social media, micro-blog has become one of the most important platforms for the communication and sharing of Internet information, such as fast information transmission, large amount of information and lack of standard content. At present, the research of emotion recognition and classification of micro-blog text has gradually become a new hot research area in the field of Natural Language Processing. The research results are of great practical significance for the enterprise to understand the users' feedback on the use of products or services in time, and to obtain public opinion and public opinion monitoring.
This study aims to solve the problem of subjective sentence recognition and sentiment classification in Chinese micro-blog texts.
First, through the analysis of micro-blog text, it summarizes some structural features of micro-blog text, and constructs emotional expression library. On the basis of analyzing the repeated punctuation symbols which often appear in micro-blog text, it collate the emotional Library of punctuation symbols to identify the emotional classification, and combine the emotional vocabulary library with the expression emotion library and the punctuation symbol emotional library. The emotional feature library of Chinese micro-blog text is built.
Two, using the word frequency statistics, the expectation cross entropy, TF-IDF, and the variance of TF-IDF to extract the emotional feature of micro-blog text. The experimental results show that the best result is obtained by the method of feature recognition and extraction based on the combination of variance and TF-IDF weighting.
Three, on the emotion recognition and classification of micro-blog text, we first judge the subjective and objectivity of the micro-blog text, using the simple Bias method and the support vector machine method to identify the subjective sentence. The experimental results show that the simple Bias method has better recognition effect on the subjective sentence. After that, we have done the feeling to the micro-blog text which belongs to the subjective sentence. A one to one classification method based on support vector machines and a pair of other classification methods are used. The experimental results show that the one to one classification method based on support vector machines has a better effect.
Four, based on the above proposed method of emotional feature extraction and the method of emotion recognition and classification, we construct a corresponding prototype system. The feasibility and effectiveness of the proposed method are verified by a series of experiments on the public evaluation data set.
【學(xué)位授予單位】:華中師范大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類(lèi)號(hào)】:TP391.1;TP393.092
【參考文獻(xiàn)】
相關(guān)期刊論文 前4條
1 賀飛艷;何炎祥;劉楠;劉健博;彭敏;;面向微博短文本的細(xì)粒度情感特征抽取方法[J];北京大學(xué)學(xué)報(bào)(自然科學(xué)版);2014年01期
2 歐陽(yáng)純萍;陽(yáng)小華;雷龍艷;徐強(qiáng);余穎;劉志明;;多策略中文微博細(xì)粒度情緒分析研究[J];北京大學(xué)學(xué)報(bào)(自然科學(xué)版);2014年01期
3 胡燕;吳虎子;鐘珞;;基于改進(jìn)的kNN算法的中文網(wǎng)頁(yè)自動(dòng)分類(lèi)方法研究[J];武漢大學(xué)學(xué)報(bào)(工學(xué)版);2007年04期
4 侯敏;滕永林;李雪燕;陳毓麒;鄭雙美;侯明午;周紅照;;話(huà)題型微博語(yǔ)言特點(diǎn)及其情感分析策略研究[J];語(yǔ)言文字應(yīng)用;2013年02期
相關(guān)博士學(xué)位論文 前4條
1 蔣良孝;樸素貝葉斯分類(lèi)器及其改進(jìn)算法研究[D];中國(guó)地質(zhì)大學(xué);2009年
2 施寒瀟;細(xì)粒度情感分析研究[D];蘇州大學(xué);2013年
3 廖一星;文本分類(lèi)及其特征降維研究[D];浙江大學(xué);2012年
4 劉楠;面向微博短文本的情感分析研究[D];武漢大學(xué);2013年
,本文編號(hào):1780447
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/1780447.html