基于遞歸神經(jīng)網(wǎng)絡(luò)的微博情感分類研究
本文選題:微博文本 切入點(diǎn):情感分類 出處:《浙江理工大學(xué)》2017年碩士論文 論文類型:學(xué)位論文
【摘要】:作為近年來(lái)快速發(fā)展的社交網(wǎng)絡(luò)平臺(tái),微博由于易操作,傳播快,靈活度高等特點(diǎn),已得到用戶的普遍推崇和使用。雖然用戶發(fā)布的微博內(nèi)容很繁雜,但通過(guò)對(duì)其觀察和分析發(fā)現(xiàn),其中潛藏著大量的有用信息,尤其是微博文本中包含的情感傾向,有助于政府和企業(yè)了解大眾需求、引導(dǎo)輿論、發(fā)現(xiàn)商機(jī)、提高收益。目前,針對(duì)微博文本的情感分類研究越來(lái)越受到相關(guān)領(lǐng)域?qū)W者的關(guān)注。如何學(xué)習(xí)深層語(yǔ)義、有效表示文本特征、提高情感分類效果一直是相關(guān)領(lǐng)域要研究的目標(biāo)。本文主要研究了微博文本情感分類的兩大方面:微博文本主客觀分類和微博文本情感極性分類。在主客觀分類階段,提出了基于詞典和語(yǔ)料相結(jié)合的方法。在情感極性分類階段,對(duì)微博文本的特征提取方法和分類算法分別進(jìn)行了研究。其中,針對(duì)特征提取,提出了基于淺層和深層學(xué)習(xí)的特征融合方法;針對(duì)分類算法,提出一種基于改進(jìn)的遞歸神經(jīng)網(wǎng)絡(luò)的情感分類方法。本文的主要工作和創(chuàng)新成果具體如下:(1)針對(duì)微博文本的主客觀分類問(wèn)題,提出了基于詞典和語(yǔ)料相結(jié)合的方法。首先根據(jù)本文所構(gòu)建的可靠情感詞典對(duì)可靠度較高的主觀性文本進(jìn)行識(shí)別,然后結(jié)合語(yǔ)料統(tǒng)計(jì)的方法對(duì)剩余文本進(jìn)行主客觀分類,最終得到的F1值比傳統(tǒng)的基于大規(guī)模情感詞典的主客觀分類方法要高出6.72%。(2)鑒于一般的淺層學(xué)習(xí)特征忽略了文本內(nèi)在語(yǔ)義,提出一種基于淺層和深層學(xué)習(xí)的特征融合方法。其中淺層學(xué)習(xí)特征選取了詞、詞性和詞典這三類特征,深層學(xué)習(xí)特征利用word2vec工具進(jìn)行提取,然后對(duì)它們進(jìn)行融合。實(shí)驗(yàn)結(jié)果表明,特征融合后的微博文本情感極性分類效果要優(yōu)于僅采用其中任何一種特征的效果。(3)針對(duì)微博文本的情感極性分類問(wèn)題,采用一種改進(jìn)的遞歸神經(jīng)網(wǎng)絡(luò)模型。該模型將一般遞歸神經(jīng)網(wǎng)絡(luò)的隱藏層替換成LSTM結(jié)構(gòu),使得在情感分類過(guò)程中,不僅把文本序列前后的相關(guān)性考慮在內(nèi),而且能夠?qū)W習(xí)到文本中距離較遠(yuǎn)的相關(guān)信息。實(shí)驗(yàn)最終得到85.04%的分類準(zhǔn)確率,比傳統(tǒng)的采用基于淺層學(xué)習(xí)特征的支持向量機(jī)方法提高了3.17%。
[Abstract]:As a social network platform for rapid development in recent years, micro-blog because of easy operation, fast spread, high flexibility, and has been widely praised by users. Although micro-blog content posted by users is very complicated, but through the observation and analysis, which hides the large amount of useful information, especially the emotional tendency includes micro-blog in the text, to help the government and enterprises to understand the needs of the masses, to guide public opinion, find business opportunities, increase revenue. At present, the research of micro-blog text sentiment classification more andmore concerned by the researchers. How to learn the deep semantic, effective text representation, improve the emotion classification effect has been to research target this paper mainly studies the two aspects of text sentiment classification: micro-blog micro-blog and micro-blog text subjective classification text sentiment polarity classification. In the subjective classification stage, put forward The method is based on the combination of dictionary and corpus. In the classification phase polarity, feature extraction and classification algorithm of micro-blog text were studied. Among them, according to the feature extraction, feature fusion method is proposed based on the study of shallow and deep; according to the classification algorithm, proposed a modified recursive neural network classification algorithm on the basis of the main work and innovations are as follows: (1) according to the classification of the micro-blog text subjective and objective method is proposed based on the combination of dictionary and corpus. Firstly, according to the identification of reliable sentiment dictionary reliable subjective text, and then combined with the method of corpus statistics on the remaining the text of the subjective classification, the final F1 value than the large-scale emotion dictionary subjective and objective classification method based on the traditional 6.72%. higher (2) in view of the general theory of shallow Xi features ignored the internal semantic text, proposes a fusion feature study of shallow and deep and shallow learning method based on feature selection of words, these three kinds of features of speech and dictionary, deep learning features are extracted by using word2vec tools, and then they are fused. The experimental results show that the emotional micro-blog text polarity classification the effect of the fused features is better than using only the effect of any one feature. (3) aiming at the problem of classification of micro-blog text polarity, using an improved recursive neural network model. The model of general recursive neural network hidden layer is replaced by the LSTM structure, the emotion classification process, not only take into account the correlation sequence before and after the text, and to be able to learn the relevant information in the text of the distance. The experiment eventually get 85.04% classification accuracy, compared with the traditional media in the shallow The learning feature support vector machine method improves the 3.17%.
【學(xué)位授予單位】:浙江理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP391.1;TP183
【參考文獻(xiàn)】
相關(guān)期刊論文 前8條
1 李婷婷;姬東鴻;;基于SVM和CRF多特征組合的微博情感分析[J];計(jì)算機(jī)應(yīng)用研究;2015年04期
2 賀飛艷;何炎祥;劉楠;劉健博;彭敏;;面向微博短文本的細(xì)粒度情感特征抽取方法[J];北京大學(xué)學(xué)報(bào)(自然科學(xué)版);2014年01期
3 張珊;于留寶;胡長(zhǎng)軍;;基于表情圖片與情感詞的中文微博情感分析[J];計(jì)算機(jī)科學(xué);2012年S3期
4 謝麗星;周明;孫茂松;;基于層次結(jié)構(gòu)的多策略中文微博情感分析和特征抽取[J];中文信息學(xué)報(bào);2012年01期
5 劉挺;車萬(wàn)翔;李正華;;語(yǔ)言技術(shù)平臺(tái)[J];中文信息學(xué)報(bào);2011年06期
6 趙妍妍;秦兵;車萬(wàn)翔;劉挺;;基于句法路徑的情感評(píng)價(jià)單元識(shí)別[J];軟件學(xué)報(bào);2011年05期
7 趙妍妍;秦兵;劉挺;;文本情感分析[J];軟件學(xué)報(bào);2010年08期
8 李曉紅;;中文文本分類中的特征詞抽取方法[J];計(jì)算機(jī)工程與設(shè)計(jì);2009年17期
,本文編號(hào):1622610
本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/1622610.html