人物評(píng)價(jià)文本情感分析研究
發(fā)布時(shí)間:2018-05-19 21:14
本文選題:漢語(yǔ)文本情感分析 + 人物評(píng)價(jià)文本。 參考:《蘇州大學(xué)》2016年博士論文
【摘要】:文本情感分析以主觀性文本為研究對(duì)象,對(duì)之進(jìn)行標(biāo)注、識(shí)別、分類、聚類和抽取等操作,以達(dá)到有效判斷、提取、匯總這些文本中蘊(yùn)含的情感和觀點(diǎn)的目的。文本情感分析目前主要的研究?jī)?nèi)容包括:情感文本語(yǔ)料庫(kù)建設(shè)、主客觀分類、評(píng)價(jià)極性分析、評(píng)價(jià)對(duì)象抽取、文本情感摘要和文本情感匯總等。隨著移動(dòng)互聯(lián)網(wǎng)應(yīng)用的普及,輿情分析、產(chǎn)品評(píng)價(jià)分析等應(yīng)用必將發(fā)揮更廣泛和重要的作用,而這些應(yīng)用都是以文本情感分析研究為基礎(chǔ)的。盡管文本情感分析的研究已經(jīng)取得了一定成績(jī),但與實(shí)際應(yīng)用的需求還有很大差距。尤其在人物評(píng)價(jià)文本情感分析方面,相關(guān)的研究非常缺乏。與研究較多的產(chǎn)品評(píng)價(jià)文本相比,人物評(píng)價(jià)文本所包含的情感表達(dá)有其獨(dú)特的特性,不能將以往的研究直接用于人物評(píng)價(jià)文本的情感分析中。針對(duì)人物評(píng)價(jià)文本,本文利用機(jī)器學(xué)習(xí)和數(shù)據(jù)挖掘方法,開展了情感分析的研究,主要工作包括以下三方面:首先,本文設(shè)計(jì)了一個(gè)基于多分類器融合和主動(dòng)學(xué)習(xí)方法的人物評(píng)價(jià)語(yǔ)料庫(kù)構(gòu)建方案,并獲得了人物正負(fù)評(píng)價(jià)語(yǔ)料庫(kù)及臟話語(yǔ)料庫(kù)。在少量人工標(biāo)注語(yǔ)料的基礎(chǔ)上,利用保守投票的多分類器融合規(guī)則,逐步擴(kuò)充一個(gè)帶正負(fù)類別標(biāo)簽的人物評(píng)價(jià)語(yǔ)料庫(kù)。該語(yǔ)料庫(kù)是針對(duì)人物評(píng)價(jià)本文情感分析的研究基礎(chǔ)。特別值得注意的是,針對(duì)人物評(píng)價(jià)文本中廣泛存在的臟話現(xiàn)象,在人工收集并標(biāo)注少量臟話句子的基礎(chǔ)上,使用主動(dòng)學(xué)習(xí)的方法,多次迭代形成了一個(gè)高質(zhì)量的臟話文本語(yǔ)料庫(kù)。實(shí)驗(yàn)結(jié)果表明,基于此語(yǔ)料庫(kù)構(gòu)建的識(shí)別臟話方法,能夠提高負(fù)面評(píng)價(jià)識(shí)別的準(zhǔn)確率和查全率。其次,本文提出了一個(gè)基于知識(shí)庫(kù)和搜索引擎的兩層架構(gòu)人物分類方法。情感分析存在領(lǐng)域依賴問(wèn)題,針對(duì)不同類型人物的評(píng)價(jià)文本的遣詞造句有較大差距。因此,針對(duì)人物評(píng)價(jià)的情感分析研究迫切需要對(duì)人物的類型進(jìn)行劃分。針對(duì)該問(wèn)題,本文提出了一個(gè)基于知識(shí)庫(kù)和搜索引擎的兩層架構(gòu)人物分類方法。利用知識(shí)庫(kù)進(jìn)行人物分類,對(duì)無(wú)法在知識(shí)庫(kù)中檢索到的人物利用搜索引擎返回的新聞文本進(jìn)行人物分類。針對(duì)搜索引擎可能反饋噪聲新聞的情況,設(shè)計(jì)了一個(gè)基于主題模型的有效新聞提取算法。實(shí)驗(yàn)結(jié)果表明,本文提出的方法能夠有效的對(duì)人物類型進(jìn)行分類。最后,本文提出了一種基于二分圖最大權(quán)完全匹配的評(píng)價(jià)要素抽取方法;谠u(píng)價(jià)對(duì)象和評(píng)價(jià)詞在文本中的修飾與約束關(guān)系,本文提出了一個(gè)基于二分圖的評(píng)價(jià)對(duì)象和評(píng)價(jià)詞抽取方法,把評(píng)價(jià)對(duì)象和評(píng)價(jià)詞作為二分圖的兩個(gè)頂點(diǎn)集合;在此基礎(chǔ)上,設(shè)計(jì)了一個(gè)集合詞性和句子關(guān)系的句子級(jí)PMI計(jì)算方法用于句子在二分圖中的權(quán)重計(jì)算方法。該方法的優(yōu)勢(shì)在于計(jì)算出的PMI值能夠精細(xì)刻畫評(píng)價(jià)對(duì)象與評(píng)價(jià)詞之間的聯(lián)系;然后,利用匈牙利和Kuhn-Munkras算法求出二分圖的最大權(quán)完全匹配,對(duì)結(jié)果進(jìn)行篩選,從而得到評(píng)價(jià)對(duì)象和評(píng)價(jià)詞二元組。實(shí)驗(yàn)結(jié)果表明本文提出的評(píng)價(jià)要素抽取方法能夠有效提高抽取的正確率和召回率。最后本文綜合上述技術(shù),通過(guò)實(shí)驗(yàn)成功挖掘出了針對(duì)不同類別人物評(píng)價(jià)文本中的主要評(píng)價(jià)對(duì)象以及常用評(píng)價(jià)詞,匯總出了正面和負(fù)面評(píng)價(jià)的評(píng)價(jià)對(duì)象的不同側(cè)重點(diǎn)?傮w而言,本文的主要貢獻(xiàn)在于對(duì)于人物評(píng)價(jià)分析的關(guān)鍵問(wèn)題進(jìn)行了深入研究。主要在人物評(píng)價(jià)情感分析語(yǔ)料庫(kù)、人物類型分類方法、評(píng)價(jià)對(duì)象和評(píng)價(jià)詞抽取方法提出了新方法。這些方法對(duì)于情感分析其他領(lǐng)域領(lǐng)域同樣具有很好的參考價(jià)值。
[Abstract]:The text emotion analysis takes the subjective text as the research object, carries on the annotation, recognition, classification, clustering and extraction, so as to achieve the effective judgment, extraction and summary of the emotions and views contained in these texts. The main research contents of text emotion analysis include: the construction of emotional text corpus, the classification of subjective and objective, and the evaluation pole. Sex analysis, evaluation of object extraction, text emotion summary and text emotion summary. With the popularization of mobile Internet, public opinion analysis, product evaluation and analysis will play a more extensive and important role, and these applications are based on the research of text emotional analysis. Although the research of text emotional analysis has already been obtained There is a great gap between the needs of the actual application, especially in the emotional analysis of the character evaluation text, the related research is very short. Compared with the more research product evaluation text, the emotion expression contained in the character evaluation text has its unique characteristics, and the previous research can not be used directly for the character evaluation text. In emotion analysis, in view of the character evaluation text, this paper uses machine learning and data mining methods to carry out the research of emotional analysis. The main work includes the following three aspects: first, this paper designs a figure evaluation language database construction scheme based on multi classifier fusion and active learning method, and obtains the character positive and negative evaluation language materials. On the basis of a small number of artificially tagged corpus, the corpus is gradually expanded by using the multi classifier fusion rules of conservative voting. The corpus is the basis for the research of emotional analysis in this paper. It is particularly noteworthy that the text is widely used in the character evaluation text. The existence of dirty words, on the basis of manual collection and annotation of a small number of dirty words, the use of active learning method, multiple iterations to form a high quality text corpus of dirty words. Experimental results show that the method of identifying dirty words based on this language database can improve the accuracy and recall of negative evaluation recognition. Secondly, this paper A classification method of two layers architecture based on knowledge base and search engine is proposed. There is a domain dependence on emotion analysis. There is a large gap in the words and sentences for the evaluation text of different types of characters. Therefore, the emotional analysis research for the character evaluation needs to be divided into the types of human and objects. A classification method of two layer architecture based on knowledge base and search engine is proposed. Using the knowledge base to classify the characters and classify the characters that can not be retrieved in the knowledge base using the news text returned by the search engine. A topic model is designed for the search engine to feed back the noise new news. The experimental results show that the method proposed in this paper can effectively classify the types of characters. Finally, this paper proposes an evaluation factor extraction method based on the complete matching of the maximum weight of the two partite graph. Based on the relation between the trimming and constraint of the evaluation object and the evaluation word in the text, this paper proposes a method based on this method. The evaluation object and the evaluation word extraction method of the two sub graph are taken as the two vertex sets of the two partite graph. On this basis, a sentence level PMI calculation method of the set part of speech and the sentence relation is designed to calculate the weight of the sentence in the two sub graph. The advantage of this method is that the calculated PMI value can be refined. The relationship between the evaluation object and the evaluation word is finely drawn; then, the maximum right of the two sub map is fully matched by Hungary and Kuhn-Munkras algorithm, and the results are screened to get the evaluation object and the evaluation word two tuples. The experimental results show that the proposed method of evaluation factor extraction can effectively improve the accuracy and call of the extraction. Finally, in this paper, the main evaluation objects and common evaluation words for different categories of character evaluation texts are successfully excavated through the experiment, and the different emphasis of the positive and negative evaluation objects is summarized. A new method is proposed for the evaluation of emotional analysis corpus, classification of character types, evaluation objects and evaluation of words extraction. These methods are also of good reference value for other fields of emotional analysis.
【學(xué)位授予單位】:蘇州大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2016
【分類號(hào)】:TP391.1
【相似文獻(xiàn)】
相關(guān)博士學(xué)位論文 前1條
1 朱曉旭;人物評(píng)價(jià)文本情感分析研究[D];蘇州大學(xué);2016年
,本文編號(hào):1911712
本文鏈接:http://sikaile.net/shoufeilunwen/xxkjbs/1911712.html
最近更新
教材專著