觀點(diǎn)挖掘中評(píng)價(jià)對(duì)象抽取方法的研究

發(fā)布時(shí)間：2018-11-11 11:17

【摘要】：觀點(diǎn)挖掘,又稱情感分析,是指通過自動(dòng)分析用戶評(píng)論的文本內(nèi)容,得到用戶對(duì)產(chǎn)品、服務(wù)、人物、事件和話題等的情感、態(tài)度和觀點(diǎn)等,具有重要的理論價(jià)值和應(yīng)用價(jià)值。觀點(diǎn)挖掘分為粗粒度和細(xì)粒度兩種,雖然粗粒度觀點(diǎn)挖掘已經(jīng)比較成熟,但是細(xì)粒度觀點(diǎn)挖掘仍然存在很多問題。評(píng)價(jià)對(duì)象抽取是細(xì)粒度觀點(diǎn)挖掘中一個(gè)重要的子任務(wù),目的是從觀點(diǎn)文本中抽取細(xì)粒度的評(píng)價(jià)對(duì)象,例如產(chǎn)品本身及其組成部分、屬性和特征等。目前,評(píng)價(jià)對(duì)象抽取方法主要分為兩類：有監(jiān)督的和無(wú)監(jiān)督的。前者主要基于隱馬爾科夫模型和條件隨機(jī)場(chǎng),后者主要基于主題模型和句法規(guī)則。近年來(lái),有研究表明基于無(wú)監(jiān)督的句法規(guī)則的方法表現(xiàn)出很好的性能,但同時(shí)面臨一些挑戰(zhàn)。第一個(gè)挑戰(zhàn)是如何快速實(shí)現(xiàn)評(píng)價(jià)對(duì)象抽取規(guī)則。第二個(gè)挑戰(zhàn)是如何從質(zhì)量參差不齊的評(píng)價(jià)對(duì)象抽取規(guī)則中自動(dòng)選擇高質(zhì)量的規(guī)則。第三個(gè)挑戰(zhàn)是如何利用大量無(wú)標(biāo)注的評(píng)論文本幫助評(píng)價(jià)對(duì)象抽取。針對(duì)這些挑戰(zhàn),本文提出以下解決方案。據(jù)我們所知,這些解決方案都是本文首次提出。(1)提出一種基于邏輯編程的評(píng)價(jià)對(duì)象抽取框架,以快速實(shí)現(xiàn)評(píng)價(jià)對(duì)象抽取規(guī)則。本文采用的邏輯編程語(yǔ)言是回答集編程語(yǔ)言(ASP)。首先將評(píng)論句子中單詞的詞性和句法依存關(guān)系等信息表示成ASP事實(shí)。然后將已知的評(píng)價(jià)對(duì)象抽取規(guī)則轉(zhuǎn)化成ASP規(guī)則。最后利用現(xiàn)有的ASP回答集求解器自動(dòng)實(shí)現(xiàn)規(guī)則。實(shí)驗(yàn)結(jié)果表明,該方法不僅高效而且簡(jiǎn)潔。(2)提出兩種自動(dòng)選擇規(guī)則的方法,以從質(zhì)量參差不齊的評(píng)價(jià)對(duì)象抽取規(guī)則中自動(dòng)選擇高質(zhì)量的規(guī)則用于評(píng)價(jià)對(duì)象抽取。第一種基于貪心算法,第二種基于局部搜索算法(模擬退火算法)。實(shí)驗(yàn)結(jié)果表明,兩種方法都能夠有效地從質(zhì)量參差不齊的初始規(guī)則集中選擇高質(zhì)量的規(guī)則子集,從而獲得比初始規(guī)則集更好的抽取結(jié)果。(3)提出一種基于語(yǔ)義相似性和相關(guān)性的評(píng)價(jià)對(duì)象推薦方法,以利用大量無(wú)標(biāo)注的評(píng)論文本幫助評(píng)價(jià)對(duì)象抽取。首先利用互聯(lián)網(wǎng)上大量無(wú)標(biāo)注的評(píng)論文本學(xué)習(xí)詞匯間的語(yǔ)義相似性和相關(guān)性知識(shí)。然后利用這些知識(shí)和少量種子評(píng)價(jià)對(duì)象向新的領(lǐng)域推薦評(píng)價(jià)對(duì)象。實(shí)驗(yàn)結(jié)果表明,該方法能夠有效利用從其它領(lǐng)域?qū)W習(xí)的知識(shí)向新的領(lǐng)域推薦高質(zhì)量的評(píng)價(jià)對(duì)象。
[Abstract]:Viewpoint mining, also known as emotional analysis, refers to the automatic analysis of the text content of user comments to get the user's feelings, attitudes and opinions on products, services, people, events and topics, etc., which have important theoretical and applied value. Viewpoint mining can be divided into coarse-grained and fine-grained. Although coarse-grained viewpoint mining is mature, there are still many problems in fine-grained viewpoint mining. Evaluation object extraction is an important sub-task in fine-grained viewpoint mining, which aims to extract fine-grained evaluation objects from view text, such as the product itself and its components, attributes and features. At present, evaluation object extraction methods are mainly divided into two categories: supervised and unsupervised. The former is mainly based on hidden Markov model and conditional random field, while the latter is mainly based on topic model and syntactic rules. In recent years, some studies have shown that the method based on unsupervised syntax rules shows good performance, but it faces some challenges at the same time. The first challenge is how to quickly implement evaluation object extraction rules. The second challenge is how to automatically select high-quality rules from different evaluation objects. The third challenge is how to use a large number of unannotated comment texts to help evaluate the object extraction. In response to these challenges, this article proposes the following solutions. As far as we know, these solutions are proposed for the first time in this paper. (1) A evaluation object extraction framework based on logical programming is proposed to implement evaluation object extraction rules quickly. The logical programming language used in this paper is the answer set programming language (ASP). Firstly, the part of speech and syntactic dependencies of the words in a comment sentence are expressed as ASP facts. Then the known evaluation object extraction rules are transformed into ASP rules. Finally, the existing ASP answer set solver is used to realize the rules automatically. The experimental results show that the proposed method is not only efficient but also simple. (2) two methods of automatic rule selection are proposed to automatically select high quality rules from the variable quality evaluation object extraction rules for evaluation object extraction. The first is based on greedy algorithm and the second is based on local search (simulated annealing algorithm). The experimental results show that both methods can effectively select a subset of high quality rules from the initial rule set with uneven quality. In order to obtain better results than the initial rule set. (3) an evaluation object recommendation method based on semantic similarity and correlation is proposed to help evaluate object extraction by using a large number of unannotated comment texts. Firstly, a large number of unannotated comments on the Internet are used to learn the semantic similarity and relevance between words. Then using these knowledge and a small number of seed evaluation objects to recommend evaluation objects to the new field. Experimental results show that this method can effectively use the knowledge learned from other fields to recommend high-quality evaluation objects to new fields.
【學(xué)位授予單位】：東南大學(xué)
【學(xué)位級(jí)別】：博士
【學(xué)位授予年份】：2016
【分類號(hào)】：TP391.1

【相似文獻(xiàn)】

相關(guān)期刊論文前10條

1 張志強(qiáng),李天柱,張波,陳少飛,郝亞南;基于文檔結(jié)構(gòu)的信息抽取規(guī)則的描述語(yǔ)言比較研究[J];河北大學(xué)學(xué)報(bào)(自然科學(xué)版);2004年02期

2 彭祥禮;朱小軍;查志勇;;Web信息抽取和展現(xiàn)系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[J];電力信息化;2012年02期

3 石倩;陳榮;魯明羽;;基于規(guī)則歸納的信息抽取系統(tǒng)實(shí)現(xiàn)[J];計(jì)算機(jī)工程與應(yīng)用;2008年21期

4 李洋;;基于Web的信息抽取研究[J];吉林工程技術(shù)師范學(xué)院學(xué)報(bào);2007年12期

5 化柏林;劉一寧;鄭彥寧;;針對(duì)學(xué)術(shù)定義的抽取規(guī)則構(gòu)建方法研究[J];情報(bào)理論與實(shí)踐;2011年12期

6 張志遠(yuǎn);徐濤;馮霞;;航班信息抽取規(guī)則的自動(dòng)生成技術(shù)[J];計(jì)算機(jī)工程;2011年06期

7 李向陽(yáng);戴江山;張亞非;;一種Web信息抽取規(guī)則的優(yōu)化方法[J];蘭州理工大學(xué)學(xué)報(bào);2006年01期

8 曲著偉;李敏強(qiáng);;基于數(shù)據(jù)區(qū)域發(fā)現(xiàn)的信息抽取規(guī)則生成方法[J];計(jì)算機(jī)工程;2009年22期

9 魏保子;王儒敬;;基于多Agent技術(shù)的分布式信息抽取系統(tǒng)研究[J];微電子學(xué)與計(jì)算機(jī);2008年06期

10 方少卿;胡學(xué)鋼;;基于Web挖掘的信息抽取系統(tǒng)的研究[J];銅陵學(xué)院學(xué)報(bào);2010年04期

相關(guān)會(huì)議論文前2條

1 葉娜;羅海濤;朱靖波;張斌;;基于歸納邏輯編程的多槽信息抽取規(guī)則自動(dòng)學(xué)習(xí)方法[A];全國(guó)第八屆計(jì)算語(yǔ)言學(xué)聯(lián)合學(xué)術(shù)會(huì)議（JSCL-2005）論文集[C];2005年

2 楊文柱;徐林昊;郝亞南;陳少飛;李天柱;;個(gè)性化的智能Web查詢助手的設(shè)計(jì)與實(shí)現(xiàn)[A];第十九屆全國(guó)數(shù)據(jù)庫(kù)學(xué)術(shù)會(huì)議論文集（技術(shù)報(bào)告篇）[C];2002年

相關(guān)博士學(xué)位論文前1條

1 劉倩;觀點(diǎn)挖掘中評(píng)價(jià)對(duì)象抽取方法的研究[D];東南大學(xué);2016年

相關(guān)碩士學(xué)位論文前10條

1 魏武;復(fù)雜結(jié)構(gòu)精確Web信息抽取規(guī)則語(yǔ)言與關(guān)鍵技術(shù)研究[D];南京大學(xué);2014年

2 羅鐳;基于用戶交互的半監(jiān)督式Web信息抽取規(guī)則生成技術(shù)研究[D];南京大學(xué);2014年

3 咸珂;基于本體的健康知識(shí)庫(kù)自動(dòng)構(gòu)建方法研究[D];哈爾濱工業(yè)大學(xué);2016年

4 余淼;主題搜索引擎的信息抽取和索引的研究[D];重慶大學(xué);2007年

5 莊重;WEB信息抽取的研究[D];湖北工業(yè)大學(xué);2009年

6 於媛;Web信息抽取系統(tǒng)SEU-WIE設(shè)計(jì)與實(shí)現(xiàn)[D];東南大學(xué);2006年

7 張曉歡;基于本體的產(chǎn)品信息抽取系統(tǒng)的研究[D];天津理工大學(xué);2009年

8 狄慧;基于Agent的Web信息抽取研究[D];大連理工大學(xué);2004年

9 陳建輝;基于模式發(fā)現(xiàn)的在線就業(yè)信息抽取[D];內(nèi)蒙古工業(yè)大學(xué);2006年

10 郭德先;一種模式發(fā)現(xiàn)算法及其Web信息抽取應(yīng)用[D];景德鎮(zhèn)陶瓷學(xué)院;2008年

，

本文編號(hào)：2324681

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/shoufeilunwen/xxkjbs/2324681.html

上一篇：云制造環(huán)境下的制造資源優(yōu)化配置方法研究
下一篇：多維多粒度的學(xué)習(xí)者個(gè)性模型及其處理策略研究

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

觀點(diǎn)挖掘中評(píng)價(jià)對(duì)象抽取方法的研究