基于主動(dòng)學(xué)習(xí)的微博情感分析方法研究

發(fā)布時(shí)間：2018-05-27 00:05

本文選題：微博情感分析 + 主動(dòng)學(xué)習(xí)�。� 參考：《吉林大學(xué)》2017年碩士論文

【摘要】：目前,作為文本挖掘重要分支之一的文本情感分析受到學(xué)者的廣泛關(guān)注。隨著互聯(lián)網(wǎng)的飛速發(fā)展和社交媒體的普及,網(wǎng)上產(chǎn)生了大量的用戶衍生文本,而這些文本主觀性很強(qiáng)并帶有明顯的情感傾向和豐富的情感信息,具有很高的研究?jī)r(jià)值。主流的情感分類方法廣泛采用了機(jī)器學(xué)習(xí),這種方法的局限在于需要大規(guī)模標(biāo)注語(yǔ)料作為訓(xùn)練集,這需要花費(fèi)巨大的成本來(lái)標(biāo)注語(yǔ)料。而在實(shí)踐當(dāng)中,容易獲得的都是未標(biāo)注的文本語(yǔ)料,因此,如何利用少量的標(biāo)注語(yǔ)料和大量的未標(biāo)注語(yǔ)料進(jìn)行文本情感分類成為了一個(gè)重要課題。本文將主動(dòng)學(xué)習(xí)方法結(jié)合進(jìn)基于機(jī)器學(xué)習(xí)的文本情感分類方法中,以解決未標(biāo)記語(yǔ)料的有效利用問(wèn)題。由于文本特征矩陣的稀疏性,采用支持向量機(jī)作為基分類器在準(zhǔn)確度上有著較大優(yōu)勢(shì)。邊緣采樣方法是利用支持向量機(jī)進(jìn)行主動(dòng)學(xué)習(xí)的經(jīng)典方法,但該方法同樣存在著錯(cuò)誤級(jí)聯(lián)、過(guò)擬合和冗余迭代等一些準(zhǔn)確率和性能上的問(wèn)題。本文針對(duì)這些問(wèn)題并在同樣使用支持向量機(jī)作為基分類器的基礎(chǔ)上提出了一個(gè)新的主動(dòng)學(xué)習(xí)方法(Active Learning in Informative Vector Selection-----ALIVS)。主要工作如下:第一,本研究對(duì)文本情感分類和主動(dòng)學(xué)習(xí)的理論進(jìn)行了系統(tǒng)研究,分析了文本情感分類的主要任務(wù)、研究流派以及主動(dòng)學(xué)習(xí)的基本假設(shè)和主流方法等基礎(chǔ)理論。并對(duì)經(jīng)典的基于邊緣的主動(dòng)學(xué)習(xí)方法進(jìn)行了研究和分析,發(fā)現(xiàn)其存在的局限。第二,本研究以上文所述的理論研究為起點(diǎn),提出新的主動(dòng)學(xué)習(xí)方法ALIVS,該方法利用未標(biāo)記樣本集的特點(diǎn)提出了信息向量(Informative Vector)的概念并結(jié)合支持向量機(jī)發(fā)展出一個(gè)二級(jí)分類的學(xué)習(xí)流程,該流程基于以下想法:采用兩級(jí)分類器,第一級(jí)主分類器負(fù)責(zé)情感分類;第二級(jí)信息向量分類器利用第一級(jí)分類器學(xué)習(xí)到的分類信息從未標(biāo)記樣本中遴選出最具分類信息的信息向量作為候選標(biāo)記樣本,經(jīng)專家標(biāo)記后,加入第一級(jí)分類器的訓(xùn)練集中,循環(huán)迭代,不斷增強(qiáng)第一級(jí)分類器的分類能力,進(jìn)而達(dá)成利用大量的未標(biāo)記文本和少量的標(biāo)記文本進(jìn)行有效訓(xùn)練的目標(biāo)。第三,本研究將該方法應(yīng)用到基于COAE2014評(píng)測(cè)的任務(wù)4的實(shí)際場(chǎng)景中,并與廣泛應(yīng)用的邊緣采樣方法進(jìn)行對(duì)比,設(shè)計(jì)實(shí)驗(yàn)對(duì)該方法的準(zhǔn)確度和性能進(jìn)行了測(cè)試和分析。實(shí)驗(yàn)結(jié)果表明,本文提出的ALIVS方法在提高準(zhǔn)確率、降低過(guò)擬合及錯(cuò)誤級(jí)聯(lián)等方面有著良好的表現(xiàn),從而證明了該方法的可行性。最后本文對(duì)該方法在未來(lái)的改進(jìn)和發(fā)展進(jìn)行了展望。
[Abstract]:At present, as one of the important branches of text mining, text emotional analysis has been widely concerned by scholars. With the rapid development of the Internet and the popularity of social media, a large number of user-derived texts have been generated on the Internet, and these texts are highly subjective, with obvious emotional tendency and rich emotional information, which has high research value. Machine learning is widely used in the mainstream affective classification methods. The limitation of this method lies in the need of large-scale tagging corpus as a training set, which requires a huge cost to annotate the corpus. In practice, it is easy to obtain unannotated text corpus, so how to use a small amount of annotated corpus and a large amount of unlabeled corpus to classify text emotion has become an important topic. In this paper, the active learning method is combined with the text emotion classification method based on machine learning to solve the problem of the effective use of unmarked corpus. Because of the sparsity of text feature matrix, support vector machine (SVM) as the basis classifier has a great advantage in accuracy. Edge sampling is a classical method for active learning using support vector machines, but it also has some problems in accuracy and performance, such as error concatenation, overfitting and redundant iteration. In this paper, we propose a new active learning method, active Learning in Informative Vector Selection-ALIVSs, based on the same support vector machine (SVM) as a basis classifier for these problems. The main work is as follows: first, this study systematically studies the theories of text emotion classification and active learning, analyzes the main tasks of text emotion classification, the basic hypothesis and mainstream methods of active learning. The classical edge-based active learning method is studied and analyzed, and its limitations are found. Second, this study starts with the theoretical research mentioned above. A new active learning method, ALIVS, is proposed in this paper. Based on the characteristics of unlabeled sample sets, the concept of information vector Informative vector is proposed and a secondary classification process is developed by combining support vector machine. The process is based on the following ideas: a two-level classifier is used, and the first primary classifier is responsible for emotion classification; The second level information vector classifier uses the information vector of the first level classifier to select the information vector with the most classified information as the candidate marker sample, and adds the training set of the first level classifier after the expert mark. Cyclic iteration enhances the classification ability of the first level classifier and achieves the goal of using a large number of unmarked text and a small amount of marked text for effective training. Thirdly, this method is applied to the actual scenario of task 4 based on COAE2014 evaluation, and compared with the widely used edge sampling method. The accuracy and performance of the method are tested and analyzed by experiments. The experimental results show that the proposed ALIVS method has a good performance in improving the accuracy, reducing over-fitting and error concatenation, which proves the feasibility of this method. Finally, the improvement and development of this method in the future are prospected.
【學(xué)位授予單位】：吉林大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2017
【分類號(hào)】：TP391.1

【相似文獻(xiàn)】

相關(guān)期刊論文前10條

1 徐興凱;;信息課上應(yīng)重視學(xué)生的主動(dòng)學(xué)習(xí)[J];小學(xué)時(shí)代(教育研究);2011年10期

2 劉蘭芳;;談學(xué)生主動(dòng)學(xué)習(xí)習(xí)慣的培養(yǎng)[J];科技資訊;2006年30期

3 劉寶峰;;由被動(dòng)學(xué)習(xí)轉(zhuǎn)為主動(dòng)學(xué)習(xí)的探討[J];天津職業(yè)院校聯(lián)合學(xué)報(bào);2012年08期

4 沈元懌;;基于主動(dòng)學(xué)習(xí)的資源優(yōu)化分配方案研究[J];佛山科學(xué)技術(shù)學(xué)院學(xué)報(bào)(自然科學(xué)版);2006年01期

5 王玲;李琴;隋美玲;肖海軍;;基于支持向量機(jī)的主動(dòng)學(xué)習(xí)方法及其實(shí)現(xiàn)[J];長(zhǎng)沙大學(xué)學(xué)報(bào);2014年02期

6 繆樹民;STS案例的探討[J];甘肅科技縱橫;2005年06期

7 王穎;高新波;李潔;王秀美;;基于PSVM的主動(dòng)學(xué)習(xí)腫塊檢測(cè)方法[J];計(jì)算機(jī)研究與發(fā)展;2012年03期

8 張桂平;李文博;王裴巖;;基于主動(dòng)學(xué)習(xí)的本體概念關(guān)系判斷[J];中文信息學(xué)報(bào);2013年04期

9 楊文君;;大學(xué)計(jì)算機(jī)基礎(chǔ)教學(xué)模式改革探索——問(wèn)題模式在教學(xué)中的應(yīng)用[J];牡丹江師范學(xué)院學(xué)報(bào)(自然科學(xué)版);2006年02期

10 魏欽冰;;大學(xué)計(jì)算機(jī)基礎(chǔ)教學(xué)模式改革探索——淺析問(wèn)題模式在教學(xué)中的應(yīng)用[J];職業(yè)圈;2007年07期

相關(guān)博士學(xué)位論文前1條

1 姚拓中;結(jié)合主動(dòng)學(xué)習(xí)的視覺場(chǎng)景理解[D];浙江大學(xué);2011年

相關(guān)碩士學(xué)位論文前9條

1 陳雄韜;基于聚類的主動(dòng)學(xué)習(xí)實(shí)例選擇方法研究[D];中國(guó)礦業(yè)大學(xué);2016年

2 張軍;基于主動(dòng)學(xué)習(xí)和遷移學(xué)習(xí)的文本情感預(yù)測(cè)研究[D];山西大學(xué);2016年

3 關(guān)雅夫;基于主動(dòng)學(xué)習(xí)的微博情感分析方法研究[D];吉林大學(xué);2017年

4 黃輝;基于局部線性重構(gòu)系數(shù)的主動(dòng)學(xué)習(xí)[D];溫州大學(xué);2014年

5 崔寶今;基于半監(jiān)督和主動(dòng)學(xué)習(xí)的蛋白質(zhì)關(guān)系抽取研究[D];大連理工大學(xué);2008年

6 張江紅;多分類主動(dòng)學(xué)習(xí)方法在地表分類中的應(yīng)用[D];南京理工大學(xué);2011年

7 易博;基于主動(dòng)學(xué)習(xí)的語(yǔ)義缺失問(wèn)句補(bǔ)全[D];哈爾濱工業(yè)大學(xué);2012年

8 柴思遠(yuǎn);結(jié)合主動(dòng)學(xué)習(xí)的協(xié)作分類方法研究[D];吉林大學(xué);2011年

9 高文濤;劃分分類模型中主動(dòng)學(xué)習(xí)關(guān)鍵技術(shù)研究[D];燕山大學(xué);2010年

，

本文編號(hào)：1939511

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1939511.html

上一篇：中國(guó)聯(lián)通高�？蛻絷P(guān)系管理系統(tǒng)設(shè)計(jì)與實(shí)現(xiàn)
下一篇：基于MapReduce的主成分分析算法研究

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于主動(dòng)學(xué)習(xí)的微博情感分析方法研究