基于深度學(xué)習(xí)的感性詞向量及情感分類(lèi)方法的研究
本文選題:情感分析 + 詞向量。 參考:《西安理工大學(xué)》2017年碩士論文
【摘要】:情感分析是對(duì)給定文本或其中片段(如句子、短語(yǔ)或詞語(yǔ))的情感極性(正、負(fù)或極性)或情感強(qiáng)度(強(qiáng)或弱)的識(shí)別。情感分析可應(yīng)用在產(chǎn)品評(píng)論分析中,可以識(shí)別用戶對(duì)產(chǎn)品設(shè)計(jì)方面的情感,為商家和產(chǎn)品的設(shè)計(jì)師提供決策支持。先前的大部分研究采用人工抽取特征和傳統(tǒng)機(jī)器學(xué)習(xí)算法相結(jié)合構(gòu)建識(shí)別系統(tǒng)。然而,人工抽取特征需要各領(lǐng)域?qū)<业南嚓P(guān)知識(shí),系統(tǒng)實(shí)用性較差,人力成本高。近年來(lái)研究者開(kāi)始使用深度學(xué)習(xí)的方法來(lái)自動(dòng)抽取特征,深度學(xué)習(xí)在自然語(yǔ)言處理中最基礎(chǔ)的一個(gè)研究成果就是詞向量,即詞的分布式表示,并在許多自然語(yǔ)言處理中得到了應(yīng)用。但是傳統(tǒng)的詞向量是根據(jù)上下文詞語(yǔ)學(xué)習(xí)獲得的,只包含語(yǔ)義和語(yǔ)法信息,而詞語(yǔ)的情感信息對(duì)于情感分析任務(wù)至關(guān)重要,現(xiàn)有大多數(shù)基于詞向量的學(xué)習(xí)方法只能對(duì)詞語(yǔ)的語(yǔ)法語(yǔ)境建模,但忽略了詞語(yǔ)的情感信息,故不能很好的解決情感分類(lèi)的任務(wù)。針對(duì)這一問(wèn)題,本文首先提出基于深度學(xué)習(xí)的感性詞向量訓(xùn)練模型,使用兩種簡(jiǎn)單的策略將文本中的情感信息與當(dāng)前詞的上下文詞語(yǔ)進(jìn)行了結(jié)合。為了驗(yàn)證學(xué)習(xí)到的情感詞向量是否準(zhǔn)確包含情感和上下文詞語(yǔ)的語(yǔ)義信息,本文分別在不同的語(yǔ)言、不同領(lǐng)域的數(shù)據(jù)集下訓(xùn)練情感詞向量,并在詞語(yǔ)級(jí)別進(jìn)行了定量實(shí)驗(yàn)。為了將詞的感性語(yǔ)義表達(dá)應(yīng)用到長(zhǎng)文本中,本文基于半監(jiān)督學(xué)習(xí)理論,將深度置信網(wǎng)絡(luò)的自適應(yīng)學(xué)習(xí)方法和主動(dòng)學(xué)習(xí)方法相結(jié)合,能有效地解決半監(jiān)督學(xué)習(xí)方法中長(zhǎng)文本情感分類(lèi)樣本選擇問(wèn)題,同時(shí)用相同的深層網(wǎng)絡(luò)結(jié)構(gòu)來(lái)進(jìn)行半監(jiān)督主動(dòng)學(xué)習(xí),使深層網(wǎng)絡(luò)結(jié)構(gòu)在主動(dòng)學(xué)習(xí)過(guò)程中進(jìn)行多次迭代訓(xùn)練,逐漸提升抽象分類(lèi)能力。當(dāng)前,面對(duì)海量文本數(shù)據(jù),為了提高情感分類(lèi)中文本處理的效率,本文利用HDFS實(shí)現(xiàn)文本數(shù)據(jù)的分布式存儲(chǔ),并結(jié)合Spark分布式內(nèi)存并行計(jì)算框架,實(shí)現(xiàn)了對(duì)文本預(yù)處理以及深度置信網(wǎng)絡(luò)的并行優(yōu)化。通過(guò)實(shí)驗(yàn)表明,基于分布式的深度置信網(wǎng)絡(luò)能夠大幅度縮短訓(xùn)練時(shí)間,加快運(yùn)算速率。
[Abstract]:Emotional analysis is the recognition of the emotional polarity (positive, negative, or polar) or the emotional intensity (strong or weak) of a given text or its segments (such as sentences, phrases, or words). Emotional analysis can be applied to product review analysis, which can identify the user's emotion on product design and provide decision support for merchants and designers. Most of the previous studies use the combination of artificial feature extraction and traditional machine learning algorithm to construct the recognition system. However, artificial extraction of features requires the relevant knowledge of experts in various fields, the system is less practical and the labor cost is high. In recent years, researchers have begun to use the method of depth learning to extract features automatically. One of the most basic research results of depth learning in natural language processing is word vector, that is, the distributed representation of words. And has been applied in many natural language processing. But the traditional word vector is based on the contextual word learning, which contains only semantic and grammatical information, and the emotional information of words is very important to the task of emotional analysis. Most of the existing learning methods based on word vector can only model the grammatical context of words, but ignore the emotional information of words, so they can not solve the task of emotion classification. To solve this problem, this paper first proposes a perceptual word vector training model based on in-depth learning. Two simple strategies are used to combine the emotional information in the text with the contextual words of the current word. In order to verify whether the learned affective word vector accurately contains the semantic information of the emotional and contextual words, this paper trains the affective word vector under different language and domain data sets, and carries out quantitative experiments at the word level. In order to apply the perceptual semantic expression of words to the long text, this paper combines the adaptive learning method of depth confidence network with the active learning method based on semi-supervised learning theory. It can effectively solve the problem of sample selection of long text affective classification in semi-supervised learning method, and use the same deep network structure to carry out semi-supervised active learning, so that the deep network structure can be trained iteratively many times in the process of active learning. Gradually improve the ability of abstract classification. At present, in order to improve the efficiency of Chinese text processing of emotion classification, this paper uses HDFS to realize distributed storage of text data, and combines with Spark distributed memory parallel computing framework. The parallel optimization of text preprocessing and depth confidence network is realized. The experiments show that the distributed depth confidence network can greatly shorten the training time and speed up the operation.
【學(xué)位授予單位】:西安理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類(lèi)號(hào)】:TP391.1;TP18
【參考文獻(xiàn)】
相關(guān)期刊論文 前7條
1 陳翠平;;基于深度信念網(wǎng)絡(luò)的文本分類(lèi)算法[J];計(jì)算機(jī)系統(tǒng)應(yīng)用;2015年02期
2 史鶴歡;許悅雷;楊志軍;李帥;李岳云;;基于深度置信網(wǎng)絡(luò)的目標(biāo)識(shí)別方法[J];計(jì)算機(jī)應(yīng)用;2014年11期
3 余凱;賈磊;陳雨強(qiáng);徐偉;;深度學(xué)習(xí)的昨天、今天和明天[J];計(jì)算機(jī)研究與發(fā)展;2013年09期
4 孫志軍;薛磊;許陽(yáng)明;王正;;深度學(xué)習(xí)研究綜述[J];計(jì)算機(jī)應(yīng)用研究;2012年08期
5 黃永文;何中市;伍星;;產(chǎn)品特征的層次關(guān)系獲取[J];計(jì)算機(jī)工程與應(yīng)用;2009年22期
6 米海濤;熊德意;劉群;;中文詞法分析與句法分析融合策略研究[J];中文信息學(xué)報(bào);2008年02期
7 劉群,張華平,俞鴻魁,程學(xué)旗;基于層疊隱馬模型的漢語(yǔ)詞法分析[J];計(jì)算機(jī)研究與發(fā)展;2004年08期
相關(guān)碩士學(xué)位論文 前4條
1 胡于響;基于Spark的推薦系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[D];浙江大學(xué);2015年
2 唐振坤;基于Spark的機(jī)器學(xué)習(xí)平臺(tái)設(shè)計(jì)與實(shí)現(xiàn)[D];廈門(mén)大學(xué);2014年
3 毛子夏;基于感性工學(xué)產(chǎn)品造型設(shè)計(jì)的理論分析研究[D];南京航空航天大學(xué);2007年
4 俞鴻魁;基于層次隱馬爾可夫模型的漢語(yǔ)詞法分析和命名實(shí)體識(shí)別技術(shù)[D];北京化工大學(xué);2004年
,本文編號(hào):2084401
本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/2084401.html