中文短文本跨領(lǐng)域情感分類算法研究
本文選題:情感分類 + 跨領(lǐng)域; 參考:《重慶大學(xué)》2016年碩士論文
【摘要】:隨著電子商務(wù)的快速發(fā)展和微博、微信等的崛起,互聯(lián)網(wǎng)上的短文本評(píng)論呈指數(shù)形式地增長(zhǎng),這些評(píng)論信息的背后蘊(yùn)藏著巨大的經(jīng)濟(jì)和社會(huì)價(jià)值。傳統(tǒng)的手工處理方法變得越來(lái)越困難,如何自動(dòng)化地挖掘這些評(píng)論中的有用信息是自然語(yǔ)言處理領(lǐng)域的一個(gè)研究熱點(diǎn)。文本情感分類技術(shù)應(yīng)運(yùn)而生,而跨領(lǐng)域情感分類由于不需要目標(biāo)領(lǐng)域標(biāo)記評(píng)論,實(shí)用性更強(qiáng)。情感分類作為一種主觀的文本挖掘技術(shù),其目的是判斷評(píng)論者對(duì)某實(shí)體(產(chǎn)品、服務(wù)、事件等)的情感傾向和評(píng)價(jià)態(tài)度(正面或負(fù)面、推薦或不推薦等)。在對(duì)現(xiàn)有情感分類算法和相關(guān)技術(shù)進(jìn)行了深入的研究基礎(chǔ)上,提出了自己的跨領(lǐng)域情感分類算法。主要研究成果如下:(1)提出了基于情感敏感性詞庫(kù)(Sentiment Sensitive Thesaurus,SST)的跨領(lǐng)域情感分類算法。針對(duì)跨領(lǐng)域分類中原始領(lǐng)域()和目標(biāo)領(lǐng)域()的領(lǐng)域獨(dú)立性問(wèn)題,提出構(gòu)建SST詞庫(kù),然后利用SST詞庫(kù)對(duì)原始領(lǐng)域和目標(biāo)領(lǐng)域的評(píng)論集進(jìn)行特征向量擴(kuò)展,最后利用擴(kuò)展之后的評(píng)論集進(jìn)行分類器訓(xùn)練和分類預(yù)測(cè)。SST是在和的評(píng)論集上構(gòu)建的,同時(shí)包含兩類領(lǐng)域的特征。該算法利用支持向量機(jī)(SVM)對(duì)擴(kuò)展之后的原始領(lǐng)域評(píng)論集進(jìn)行分類器的訓(xùn)練,所得分類器對(duì)擴(kuò)展之后的目標(biāo)領(lǐng)域評(píng)論集進(jìn)行分類預(yù)測(cè)。通過(guò)在酒店、電腦和書(shū)籍三個(gè)領(lǐng)域的數(shù)據(jù)集上進(jìn)行9組實(shí)驗(yàn)表明,基于SST的跨領(lǐng)域分類算法分類效果較好。論文還對(duì)算法中的參數(shù)K和訓(xùn)練集大小對(duì)分類器分類效果的影響進(jìn)行了實(shí)驗(yàn)探討。(2)提出了投票集成的跨領(lǐng)域情感分類算法。利用集成學(xué)習(xí)的思想組合多個(gè)基分類器的結(jié)果來(lái)提升分類器分類效果。實(shí)驗(yàn)中采用了簡(jiǎn)單投票和加權(quán)投票兩種方式,同樣在酒店、電腦和書(shū)籍三個(gè)語(yǔ)料庫(kù)上進(jìn)行實(shí)驗(yàn),結(jié)果表明投票集成分類算法分類效果明顯優(yōu)于單個(gè)基分類器的分類效果。(3)改進(jìn)的Stacking集成分類算法。算法利用無(wú)監(jiān)督的NTUSD情感詞典分類方法,先對(duì)目標(biāo)領(lǐng)域評(píng)論集進(jìn)行分類,將其中部分情感極性較強(qiáng)的評(píng)論進(jìn)行標(biāo)記后加入到原始領(lǐng)域的評(píng)論集中,擴(kuò)展訓(xùn)練集的構(gòu)成,減小領(lǐng)域差異性。通過(guò)這種方式改進(jìn)Stacking算法在跨領(lǐng)域分類中的實(shí)際應(yīng)用效果。實(shí)驗(yàn)結(jié)果表明,Stacking集成分類算法能獲得較好的分類效果,集成學(xué)習(xí)在跨領(lǐng)域情感分類中的應(yīng)用具有研究?jī)r(jià)值。
[Abstract]:With the rapid development of electronic commerce and the rise of Weibo and WeChat, the short text reviews on the Internet have increased exponentially. Behind these comments, there are enormous economic and social values. Traditional manual processing methods are becoming more and more difficult. How to automatically mine useful information from these comments is a research hotspot in the field of natural language processing. The technology of text emotion classification emerges as the times require, and cross-domain emotion classification is more practical because it does not need target domain tagging comment. As a subjective text mining technique, emotion classification aims to judge the emotional tendency and evaluation attitude (positive or negative, recommendation or not) of the reviewer towards a certain entity (product, service, event, etc.). On the basis of deep research on the existing emotion classification algorithms and related technologies, this paper puts forward its own cross-domain emotion classification algorithm. The main results are as follows: (1) A cross-domain emotion classification algorithm based on sentiment sensitive Thesaurus (SST) is proposed. Aiming at the problem of domain independence of original domain () and target domain () in cross-domain classification, this paper proposes to construct SST lexicon, and then extends the comment set of original domain and target domain by using SST lexicon. Finally, the extended comment set is used for classifier training and classification prediction. The SST is constructed on the comment set of the sum and contains two kinds of domain features. Support vector machine (SVM) is used to train the original domain comment set, and the classifier is used to predict the extended target domain comment set. Nine groups of experiments on the data sets of hotel, computer and books show that the algorithm based on SST is effective. The effect of parameter K and training set size on classifier classification effect is also discussed experimentally. (2) A cross-domain emotion classification algorithm based on voting ensemble is proposed. Using the idea of integrated learning to combine the results of multiple base classifiers to improve the classifier classification effect. The experiment was conducted in two ways: simple voting and weighted voting. The experiments were also carried out on three corpora: hotel, computer and books. The results show that the classification effect of voting ensemble classifier is better than that of single base classifier. (3) improved Stacking ensemble classification algorithm. The algorithm uses the unsupervised NTUSD emotion dictionary classification method, classifies the target domain comment set first, marks some of the comments with strong affective polarity, then adds them to the original domain comment set to expand the composition of the training set. Reduce domain differences. In this way, the effect of Stacking algorithm in cross-domain classification is improved. The experimental results show that Stacking ensemble classification algorithm can achieve better classification effect, and the application of ensemble learning in cross-domain emotion classification is valuable.
【學(xué)位授予單位】:重慶大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2016
【分類號(hào)】:TP391.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前6條
1 周星;丁立新;萬(wàn)潤(rùn)澤;葛強(qiáng);;分類器集成算法研究[J];武漢大學(xué)學(xué)報(bào)(理學(xué)版);2015年06期
2 魏現(xiàn)輝;張紹武;楊亮;林鴻飛;;基于加權(quán)SimRank的跨領(lǐng)域文本情感傾向性分析[J];模式識(shí)別與人工智能;2013年11期
3 吳瓊;劉悅;沈華偉;張瑾;許洪波;程學(xué)旗;;面向跨領(lǐng)域情感分類的統(tǒng)一框架[J];計(jì)算機(jī)研究與發(fā)展;2013年08期
4 張慧;李壽山;李培峰;朱巧明;;基于評(píng)價(jià)對(duì)象類別的跨領(lǐng)域情感分類方法研究[J];計(jì)算機(jī)科學(xué);2013年01期
5 翟忠武;徐華;賈培發(fā);;An Empirical Study of Unsupervised Sentiment Classification of Chinese Reviews[J];Tsinghua Science and Technology;2010年06期
6 朱嫣嵐;閔錦;周雅倩;黃萱菁;吳立德;;基于HowNet的詞匯語(yǔ)義傾向計(jì)算[J];中文信息學(xué)報(bào);2006年01期
相關(guān)博士學(xué)位論文 前2條
1 李巖;文本情感分析中關(guān)鍵問(wèn)題的研究[D];北京郵電大學(xué);2014年
2 陳博;WEB文本情感分類中關(guān)鍵問(wèn)題的研究[D];北京郵電大學(xué);2008年
相關(guān)碩士學(xué)位論文 前2條
1 華林森;中文文本情感分類研究[D];重慶大學(xué);2014年
2 徐帥;基于中文微博的情感分析研究[D];華中科技大學(xué);2013年
,本文編號(hào):2096441
本文鏈接:http://sikaile.net/jingjilunwen/dianzishangwulunwen/2096441.html