基于CNN特征空間的微博多標簽情感分類
發(fā)布時間:2018-04-29 20:45
本文選題:情感分類 + 多標簽分類 ; 參考:《工程科學(xué)與技術(shù)》2017年03期
【摘要】:面對微博情感評測任務(wù)中的多標簽分類問題時,基于向量空間模型的傳統(tǒng)文本特征表示方法難以提供有效的語義特征;谏疃葘W(xué)習的詞向量表示技術(shù),能夠很好地體現(xiàn)詞語的語法和語義關(guān)系,且可以依據(jù)語義合成原理有效地構(gòu)建句子的特征表示向量。作者提出一個針對微博句子的多標簽情感分類系統(tǒng),首先從1個大規(guī)模的無標注微博文本數(shù)據(jù)集中學(xué)習中文詞語的詞向量表示,然后采用卷積神經(jīng)網(wǎng)絡(luò)(convolution neural network,CNN)模型進行有監(jiān)督的多情感分類學(xué)習,利用學(xué)習到的CNN模型將微博句子中的詞向量合成為句子向量,最后將這些句子向量作為特征訓(xùn)練多標簽分類器,完成微博的多標簽情感分類。2013年NLPCC(Natural Language Processing and Chinese Computing)會議的微博情感評測公開數(shù)據(jù)集中,相比最優(yōu)評測結(jié)果的寬松指標和嚴格指標,本系統(tǒng)的最佳分類性能分別提升了19.16%和17.75%;采用Recursive Neural Tensor Network模型合成句子向量的方法,取得目前已知文獻中的最佳分類性能,系統(tǒng)將2個指標分別提升了3.66%和2.89%。采用多種多標簽分類器來對比不同的特征表示方法,發(fā)現(xiàn)基于CNN特征空間的句子向量具有最好的情感語義區(qū)分度;通過對CNN迭代訓(xùn)練過程的分析,體現(xiàn)了語義合成過程中的模式識別規(guī)律。進一步的工作包括引入更多合適的深度學(xué)習模型,并深入探索基于詞向量的語義合成現(xiàn)象。
[Abstract]:In the face of the problem of multi-label classification in Weibo's emotional evaluation task, the traditional text feature representation method based on vector space model is difficult to provide effective semantic features. The technology of word vector representation based on deep learning can well reflect the grammar and semantic relationship of words, and can effectively construct the feature representation vector of sentences according to the principle of semantic composition. The author proposes a multi-label affective classification system for Weibo sentences. Firstly, a large scale untagged Weibo text data set is used to learn the word vector representation of Chinese words. Then the convolutional neural network neural network is used for supervised multi-emotion classification learning, and the word vectors in Weibo sentences are synthesized into sentence vectors by using the learned CNN model. Finally, these sentence vectors are used as feature training multi-label classifiers to complete the multi-label affective classification of Weibo. The Weibo affective evaluation open data set of the 2013 NLPCC(Natural Language Processing and Chinese Computing) conference is compared with the loose and strict indexes of the optimal evaluation results. The optimal classification performance of the system was improved by 19.16% and 17.75%, respectively, and the best classification performance was obtained by using Recursive Neural Tensor Network model to synthesize sentence vectors, and the two indexes were improved by 3.66% and 2.89% respectively. Several multi-label classifiers are used to compare different feature representation methods. It is found that sentence vectors based on CNN feature space have the best emotional and semantic discriminations, and the process of CNN iterative training is analyzed. It embodies the pattern recognition law in the process of semantic synthesis. Further work includes introducing more appropriate depth learning models and exploring the semantic synthesis phenomenon based on word vector.
【作者單位】: 武漢大學(xué)計算機學(xué)院;武漢大學(xué)軟件工程國家重點實驗室;
【基金】:國家自然科學(xué)基金資助項目(61303115;61373039;61472290) 高等學(xué)校博士學(xué)科點專項科研基金資助項目(2013014111002512)
【分類號】:TP183;TP391.1
【相似文獻】
相關(guān)期刊論文 前4條
1 楊懷恒;閔樂泉;;設(shè)計局部最大灰度值探測CNN模板的定理與應(yīng)用[J];計算機工程與應(yīng)用;2006年19期
2 陳瑞森;;數(shù)字CNN微處理器的指令集設(shè)計[J];現(xiàn)代電子技術(shù);2009年24期
3 沙莎;劉金珠;閔樂泉;;復(fù)合4鄰域圈提取CNN的魯棒性設(shè)計[J];計算機工程與應(yīng)用;2011年02期
4 ;[J];;年期
相關(guān)會議論文 前1條
1 劉國華;張穎;陳子軍;陳子陽;;改進的CNN搜索算法[A];第二十屆全國數(shù)據(jù)庫學(xué)術(shù)會議論文集(技術(shù)報告篇)[C];2003年
相關(guān)重要報紙文章 前1條
1 本報記者 馬佳;調(diào)查CNN“中國黑客”報道[N];北京科技報;2008年
,本文編號:1821484
本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/1821484.html
最近更新
教材專著