天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于CRF和名詞短語(yǔ)識(shí)別的中文微博情感要素抽取

發(fā)布時(shí)間:2018-03-20 21:51

  本文選題:情感要素 切入點(diǎn):條件隨機(jī)場(chǎng) 出處:《大連理工大學(xué)》2014年碩士論文 論文類型:學(xué)位論文


【摘要】:隨著信息技術(shù)的發(fā)展,信息的發(fā)布和傳播速度越來越快,如何從海量數(shù)據(jù)中提取有價(jià)值的信息顯得越來越重要。微博作為近年來新的社交平臺(tái)工具發(fā)展很快,用戶數(shù)量龐大,除了主動(dòng)發(fā)布信息,還可以通過話題的方式參與討論,話題的類別多種多樣,很多有價(jià)值的話題的討論內(nèi)容會(huì)帶有作者的主觀意愿。如何分析出這些話題微博的情感要素就是本文的研究?jī)?nèi)容,情感要素的抽取包括情感對(duì)象的抽取和情感傾向的判斷。 在情感傾向判斷問題上,由于中文微博可以包含較大的信息量,一條微博可能含有多個(gè)情感對(duì)象,因此基于機(jī)器學(xué)習(xí)的情感傾向分類較難以劃分邊界。本文采用建立詞典的方法對(duì)情感對(duì)象的情感傾向進(jìn)行判斷,通過詞典的匹配形成情感單元,使用情感單元的情感值判斷情感對(duì)象的情感傾向。 在情感對(duì)象抽取問題上,本文使用條件隨機(jī)場(chǎng)(CRF)模型進(jìn)行情感對(duì)象抽取。結(jié)合詞形、詞性、是否為情感詞和依存信息等語(yǔ)義特征,實(shí)現(xiàn)對(duì)情感對(duì)象的自動(dòng)抽取。該方法在閉式測(cè)試中效果較好,但開式測(cè)試效果較差。造成結(jié)果的原因很大一部分是CRF方法的訓(xùn)練語(yǔ)料規(guī)模不夠,但人工標(biāo)注語(yǔ)料的成本過高,語(yǔ)料規(guī)模難以擴(kuò)大。 由于CRF方法在該問題上的表現(xiàn)不佳,本文提出一種基于名詞短語(yǔ)識(shí)別的候選情感對(duì)象表自動(dòng)生成的方法,該方法結(jié)合依存信息對(duì)候選情感對(duì)象進(jìn)行有效的過濾,得到候選情感對(duì)象表,利用該表對(duì)CRF未識(shí)別出情感對(duì)象的句子進(jìn)行情感對(duì)象抽取。實(shí)驗(yàn)表明該方法在情感對(duì)象抽取問題上較為有效。
[Abstract]:With the development of information technology, the speed of information dissemination and dissemination is getting faster and faster. How to extract valuable information from massive data becomes more and more important. Weibo, as a new social platform tool, has developed rapidly in recent years and has a large number of users. In addition to actively publishing information, you can also participate in the discussion through the way of topics, there are many kinds of topics, How to analyze the emotional elements of Weibo is the research content of this paper. The extraction of emotional elements includes the extraction of emotional objects and the judgment of emotional tendency. On the issue of emotional disposition judgment, as Chinese Weibo can contain a large amount of information, a Weibo may contain more than one emotional object. Therefore, the classification of emotion tendency based on machine learning is difficult to divide the boundary. In this paper, we use the method of establishing dictionary to judge the emotion tendency of emotion object, and form the emotion unit by matching the dictionary. The emotion value of the emotion unit is used to judge the emotional tendency of the emotion object. In the problem of emotional object extraction, we use conditional random field (CRF) model to extract affective object, combining semantic features such as word form, part of speech, whether emotional word and dependent information, etc. The effect of this method is good in closed test, but the effect of open test is poor. The reason of the result is that the scale of training corpus of CRF method is not enough, but the cost of manual tagging is too high. The scale of the corpus is difficult to expand. Due to the poor performance of the CRF method on this issue, this paper proposes a method of automatic generation of candidate emotional object tables based on noun phrase recognition, which combines dependency information to filter candidate emotional objects effectively. A list of candidate emotional objects is obtained and used to extract emotional objects from sentences that are not recognized by CRF. Experiments show that this method is more effective in the problem of emotional object extraction.
【學(xué)位授予單位】:大連理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類號(hào)】:TP391.1;TP393.092

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 李藝紅;蔣秀鳳;;中文句子傾向性分析[J];福州大學(xué)學(xué)報(bào)(自然科學(xué)版);2010年04期

2 孫艷;周學(xué)廣;付偉;;基于主題情感混合模型的無監(jiān)督文本情感分析[J];北京大學(xué)學(xué)報(bào)(自然科學(xué)版);2013年01期

3 蘇杰;繆裕青;劉少兵;吳孔玲;;基于語(yǔ)義傾向計(jì)算器的情感分析方法[J];桂林電子科技大學(xué)學(xué)報(bào);2012年04期

4 劉志明;劉魯;;基于機(jī)器學(xué)習(xí)的中文微博情感分類實(shí)證研究[J];計(jì)算機(jī)工程與應(yīng)用;2012年01期

5 張昱琪,周強(qiáng);漢語(yǔ)基本短語(yǔ)的自動(dòng)識(shí)別[J];中文信息學(xué)報(bào);2002年06期

6 劉鴻宇;趙妍妍;秦兵;劉挺;;評(píng)價(jià)對(duì)象抽取及其傾向性分析[J];中文信息學(xué)報(bào);2010年01期

7 謝麗星;周明;孫茂松;;基于層次結(jié)構(gòu)的多策略中文微博情感分析和特征抽取[J];中文信息學(xué)報(bào);2012年01期

8 楊亮;林原;林鴻飛;;基于情感分布的微博熱點(diǎn)事件發(fā)現(xiàn)[J];中文信息學(xué)報(bào);2012年01期

9 龐磊;李壽山;周國(guó)棟;;基于情緒知識(shí)的中文微博情感分類方法[J];計(jì)算機(jī)工程;2012年13期

10 韓忠明;張玉沙;張慧;萬月亮;黃今慧;;有效的中文微博短文本傾向性分類算法[J];計(jì)算機(jī)應(yīng)用與軟件;2012年10期

,

本文編號(hào):1640939

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/guanlilunwen/ydhl/1640939.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶8375b***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com