基于Word2Vec的情感詞典自動(dòng)構(gòu)建與優(yōu)化
發(fā)布時(shí)間:2018-06-08 00:25
本文選題:情感分析 + 多元情感分類; 參考:《計(jì)算機(jī)科學(xué)》2017年01期
【摘要】:情感詞典的構(gòu)建是文本挖掘領(lǐng)域中重要的基礎(chǔ)性工作。近幾年,情感詞典的極性標(biāo)注從二元褒貶標(biāo)注向多元情緒標(biāo)注發(fā)展,詞典的領(lǐng)域特性也日趨明顯。但是情感類別的手工標(biāo)注不但費(fèi)時(shí)費(fèi)力,而且情感強(qiáng)度難以得到準(zhǔn)確量化,同時(shí)對(duì)領(lǐng)域性的過分關(guān)注也大大限制了情感詞典的適用性[1]。通過神經(jīng)網(wǎng)絡(luò)語言模型對(duì)大規(guī)模中文語料進(jìn)行統(tǒng)計(jì)訓(xùn)練,并在此基礎(chǔ)上提出了基于轉(zhuǎn)換約束集的多維情感詞典自動(dòng)構(gòu)建方法;然后研究了基于詞分布密度的感情色彩消歧方法,對(duì)兼具褒貶意味詞語的感情極性進(jìn)行區(qū)分和識(shí)別,并分別計(jì)算兩種感情色彩下的情感類別與強(qiáng)度;最后提出基于多個(gè)語義資源的全局優(yōu)化方案,得到包含10種情緒標(biāo)注的多維漢語情感詞典SentiRuc。實(shí)驗(yàn)證實(shí)該詞典1)在類別標(biāo)注檢驗(yàn)、強(qiáng)度標(biāo)注檢驗(yàn)、情感消歧效果及情感分類任務(wù)中均具有良好的效果,其中的情感強(qiáng)度檢驗(yàn)證實(shí)該詞典具有極強(qiáng)的情感語義描述力。
[Abstract]:The construction of emotion dictionary is an important basic work in the field of text mining. In recent years, polarity tagging in emotional dictionaries has developed from binary praise and derogation to multivariate emotional tagging, and the domain characteristics of the dictionaries have become more and more obvious. However, the manual labeling of emotion categories is time-consuming and laborious, and the intensity of emotion is difficult to be accurately quantified. At the same time, too much attention to domain also limits the applicability of emotion dictionaries [1]. The neural network language model is used to train the large scale Chinese corpus, and on the basis of this, an automatic construction method of multi-dimensional emotion dictionary based on the transformation constraint set is proposed, and then the emotion color disambiguation method based on word distribution density is studied. The emotional polarity of both positive and negative words is distinguished and recognized, and the emotion categories and intensity under two kinds of emotional colors are calculated respectively. Finally, a global optimization scheme based on multiple semantic resources is proposed. A multi-dimensional Chinese emotion dictionary named SentiRuc. which contains 10 kinds of emotion-tagging is obtained. The experimental results show that the dictionary has good results in category labeling test, intensity labeling test, emotional disambiguation effect and emotion classification task, and the emotional strength test proves that the dictionary has strong affective semantic description ability.
【作者單位】: 中國人民大學(xué)信息學(xué)院;
【基金】:國家自然科學(xué)基金(71271209) 北京市自然科學(xué)基金(4132067) 教育部人文社會(huì)科學(xué)青年基金(11YJC630268) 數(shù)字出版技術(shù)國家重點(diǎn)實(shí)驗(yàn)室開放課題資助
【分類號(hào)】:TP391.1
【相似文獻(xiàn)】
相關(guān)碩士學(xué)位論文 前1條
1 朱雪梅;基于Word2Vec主題提取的微博推薦[D];北京理工大學(xué);2014年
,本文編號(hào):1993506
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1993506.html
最近更新
教材專著