跨語言文本情感分類技術(shù)研究

發(fā)布時(shí)間：2018-08-06 08:19

【摘要】：文本情感分類旨在通過計(jì)算機(jī)技術(shù),對文本中表達(dá)的主觀情感傾向性進(jìn)行判斷,通過充分挖掘和分析文本生產(chǎn)者的興趣傾向和情感態(tài)度,為決策者提供有價(jià)值的重要參考信息。由于國內(nèi)外有效的高質(zhì)量分析語料、情感詞典等分布不均,使得跨語言文本情感分類研究應(yīng)運(yùn)而生�？缯Z言文本情感分類是利用源語言的有標(biāo)注語料,輔助目標(biāo)語言進(jìn)行情感傾向性分析,其核心問題是解決如何將源語言和目標(biāo)語言轉(zhuǎn)換到同一語言空間中。根據(jù)國內(nèi)外不同語言空間的轉(zhuǎn)換手段不同,可將其分為三類:利用雙語詞典、平行語料庫建立兩種語言的對應(yīng)關(guān)系以及利用機(jī)器翻譯技術(shù)等三種研究方案。本文對上述三種方案分別作了相應(yīng)嘗試,主要貢獻(xiàn)包括以下幾個(gè)方面:(1)提出了一種在主動(dòng)學(xué)習(xí)框架下的單語言文本情感分析方法SLAB。該方法中的采樣策略是在不確定性采樣策略的基礎(chǔ)上,使用情感詞典,在選擇最不確定的樣本的同時(shí),也選擇情感分?jǐn)?shù)較大的樣本,彌補(bǔ)了不確定性采樣策略的不足,從而達(dá)到提高分類器準(zhǔn)確率的目的。應(yīng)用上述主動(dòng)學(xué)習(xí)中提出的采樣策略實(shí)現(xiàn)一種跨語言文本情感分類方法AL-CLSC。該方法首先利用機(jī)器翻譯技術(shù),將英文文本翻譯為中文,然后通過主動(dòng)學(xué)習(xí)方法,主動(dòng)選擇“好的”訓(xùn)練樣本,通過循環(huán)訓(xùn)練,最終實(shí)現(xiàn)一個(gè)較好的中文文本情感分類器。進(jìn)一步地,本文結(jié)合圖結(jié)構(gòu)模型對所提出的方法AL-CLSC進(jìn)行改進(jìn),提出GAL-CLSC方法,以期解決機(jī)器翻譯訓(xùn)練語料時(shí),可能造成的信息丟失、重復(fù)及偏差等問題。實(shí)驗(yàn)結(jié)果顯示,在不同的訓(xùn)練集中,該改進(jìn)方法對分類器的準(zhǔn)確率確有明顯提高。(2)考慮到近年來神經(jīng)網(wǎng)絡(luò)在文本情感分類任務(wù)中的突出表現(xiàn),本文提出兩種分別結(jié)合RNN和CNN的深度典型相關(guān)性跨語言文本情感分類方法DCCA-RNN和DCCA-CNN。該兩種方法是利用平行語料,在深度典型相關(guān)性的理論基礎(chǔ)上,通過RNN和CNN學(xué)習(xí)兩種語言空間的非線性關(guān)系,在映射的共享特征空間中利用典型性相關(guān)實(shí)現(xiàn)跨語言文本情感分類。
[Abstract]:The purpose of text emotion classification is to judge the tendency of subjective emotion expressed in text by computer technology, and to provide valuable reference information for decision makers by fully mining and analyzing the interest tendency and emotional attitude of text producers. Due to the uneven distribution of effective high quality analytical corpus and emotion dictionary at home and abroad, cross-language text emotion classification research emerges as the times require. Cross-language text affective classification is to use tagged corpus of source language to assist target language in emotional orientation analysis. Its core problem is how to transform source language and target language into the same language space. It can be divided into three categories according to the different methods of language space conversion at home and abroad: making use of bilingual dictionaries, establishing the corresponding relations between two languages in parallel corpus, and using machine translation technology. The main contributions are as follows: (1) A single language text affective analysis method, SLAB, is proposed under the framework of active learning. In this method, the sampling strategy is based on the uncertain sampling strategy, using the emotion dictionary to select the most uncertain samples, and at the same time to select the samples with high emotional score, which makes up for the lack of the uncertain sampling strategy. In order to improve the accuracy of the classifier. A cross-language text affective classification method, AL-CLSCC, is implemented using the sampling strategy proposed in the above active learning. The method first uses machine translation technology to translate the English text into Chinese, then through the active learning method, chooses the "good" training sample actively, and finally realizes a better Chinese text emotion classifier by cyclic training. Furthermore, this paper improves the proposed method AL-CLSC by using graph structure model, and proposes a GAL-CLSC method to solve the problems of information loss, repetition and deviation caused by machine translation training corpus. The experimental results show that the improved method does improve the accuracy of classifier in different training concentration. (2) considering the prominent performance of neural network in text emotion classification task in recent years, In this paper, we propose two cross-language affective classification methods, DCCA-RNN and DCCA-CNN, which combine with RNN and CNN, respectively. The two methods are based on the theory of depth canonical correlation, using parallel corpus to learn the nonlinear relationship between the two languages by RNN and CNN. In the shared feature space of mapping, canonical correlation is used to achieve cross-language text affective classification.
【學(xué)位授予單位】：華僑大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2017
【分類號】：TP391.1

【相似文獻(xiàn)】

相關(guān)期刊論文前10條

1 劉颯;章成志;;多語言文本表示研究綜述[J];現(xiàn)代圖書情報(bào)技術(shù);2010年06期

2 張廷遠(yuǎn);;河南省紅色旅游景區(qū)語言文本現(xiàn)狀的調(diào)查及思考[J];開封大學(xué)學(xué)報(bào);2010年04期

3 高影繁;徐紅姣;于薇;王惠臨;;基于跨語言文本分類的多語資源組織方法研究[J];情報(bào)理論與實(shí)踐;2011年10期

4 熊超;王明文;吳福英;吳世勇;沈陽;;基于潛在語義對偶空間的跨語言文本分類研究[J];廣西師范大學(xué)學(xué)報(bào)(自然科學(xué)版);2010年01期

5 杜家利;于屏方;;自然語言文本語義接受度的在線系統(tǒng)評價(jià)研究[J];計(jì)算機(jī)工程與應(yīng)用;2008年26期

6 章成志;王惠臨;;多語言文本聚類研究綜述[J];現(xiàn)代圖書情報(bào)技術(shù);2009年06期

7 史樹敏;黃河燕;劉東升;;自然語言文本指代消解技術(shù)研究[J];計(jì)算機(jī)科學(xué);2007年12期

8 彭靜;羅偉;;一種從自然語言文本到本體模型的轉(zhuǎn)換方法[J];電大理工;2011年02期

9 馮志偉;;漢字的極限熵[J];中文信息;1996年02期

10 陳啟泉;吳揚(yáng)揚(yáng);鄭躍斌;;CTDPS查詢語言文本與實(shí)現(xiàn)[J];華僑大學(xué)學(xué)報(bào)(自然科學(xué)版);1985年02期

相關(guān)會議論文前4條

1 伊·達(dá)瓦;井佐原均;;蒙古語多文種-多語言文本-口語語料庫的建設(shè)[A];第十屆全國少數(shù)民族語言文字信息處理學(xué)術(shù)研討會論文集[C];2005年

2 盧衛(wèi)雄;;一種基于支持向量機(jī)的多國語言文本分類平臺[A];第十六屆全國青年通信學(xué)術(shù)會議論文集（上）[C];2011年

3 俞榮華;田增平;周傲英;;一種基于聚類的多語言文本相似記錄檢測算法[A];第十八屆全國數(shù)據(jù)庫學(xué)術(shù)會議論文集（技術(shù)報(bào)告篇）[C];2001年

4 曹暉;于洪志;;OpenOffice的國際化與本地化機(jī)制[A];第十屆全國少數(shù)民族語言文字信息處理學(xué)術(shù)研討會論文集[C];2005年

相關(guān)博士學(xué)位論文前2條

1 鄒博偉;面向自然語言文本的否定性與不確定性識別研究[D];蘇州大學(xué);2015年

2 朱澤德;網(wǎng)絡(luò)雙語語料挖掘關(guān)鍵技術(shù)研究[D];中國科學(xué)技術(shù)大學(xué);2014年

相關(guān)碩士學(xué)位論文前10條

1 石杰;中泰跨語言話題檢測方法與技術(shù)研究[D];昆明理工大學(xué);2015年

2 楊文敏;自然語言文本中不確定性信息的識別研究[D];河南工業(yè)大學(xué);2015年

3 畢文霞;基于中間語義的跨語言文本分類模型研究[D];江西師范大學(xué);2008年

4 劉越;跨語言文本分類的研究[D];北京理工大學(xué);2011年

5 熊超;基于潛在語義對偶空間的跨語言文本分類研究[D];江西師范大學(xué);2010年

6 彭哲;跨語言文本相關(guān)性檢測技術(shù)研究[D];中南大學(xué);2014年

7 萬接喜;多語言文本聚類研究[D];南京大學(xué);2013年

8 趙江;單語言與跨語言文本蘊(yùn)含關(guān)系識別的研究[D];華東師范大學(xué);2015年

9 何文壘;基于WordNet的中英文跨語言文本相似度研究[D];上海交通大學(xué);2011年

10 甘燦;基于同義詞替換的自然語言文本信息隱藏技術(shù)研究[D];湖南大學(xué);2008年

，

本文編號：2167108

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2167108.html

上一篇：基于語義空間統(tǒng)一表征的視頻多模態(tài)內(nèi)容分析技術(shù)
下一篇：基于同步相機(jī)陣列的自動(dòng)人體三維重建

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

跨語言文本情感分類技術(shù)研究