基于深度學(xué)習(xí)的微博評論情感傾向性分析
發(fā)布時間:2018-08-30 09:13
【摘要】:隨著移動互聯(lián)網(wǎng)的迅猛發(fā)展,網(wǎng)民參與社會熱點討論的熱情不斷高漲,新浪微博成為網(wǎng)民發(fā)表觀點抒發(fā)情感的重要平臺,基于新浪微博的社交網(wǎng)絡(luò)很大程度上反應(yīng)了中國人的社交行為和情感傾向。如何快速挖掘出新浪微博中隱藏的情感信息,為政府和企業(yè)的決策提供有效的輔助信息,正成為自然語言處理領(lǐng)域的研究熱點。傳統(tǒng)的情感分析需要花費大量的時間提取數(shù)據(jù)中的特征,并且往往需要與語法規(guī)則相結(jié)合才能取得比較好的結(jié)果,但是在大數(shù)據(jù)時代,數(shù)據(jù)量越來越大,人工提取特征的難度不斷加大。本文提出使用詞向量加深度學(xué)習(xí)組合的方式去學(xué)習(xí)數(shù)據(jù)中的情感信息,其中,使用無監(jiān)督的Word2vec和Glove模型將數(shù)據(jù)訓(xùn)練成詞向量,詞向量將取代人工提取的特征,這種方法節(jié)省了人力,并且使用深度學(xué)習(xí)模型自動學(xué)習(xí)詞向量中的情感信息,最后,通過對比實驗驗證深度學(xué)習(xí)模型能夠在語句級情感分析任務(wù)中取得較好的效果。本文通過Word2vec和Glove語言模型將微博評論數(shù)據(jù)訓(xùn)練生成兩種詞向量并分別輸入到淺層學(xué)習(xí)模型SVM、Logistic Regression、Naive Bayesian和深度學(xué)習(xí)模型LSTM、CNN、LSTM+CNN中,淺層學(xué)習(xí)模型和深度學(xué)習(xí)模型通過學(xué)習(xí)得到詞向量中隱藏的情感信息并給出情感分類的結(jié)果,根據(jù)實驗結(jié)果統(tǒng)計模型的準(zhǔn)確率、召回率等模型性能評估指標(biāo),其中,淺層學(xué)習(xí)模型最高的準(zhǔn)確率接近78.1%,深度學(xué)習(xí)模型最高的準(zhǔn)確率接近84.5%。通過對比實驗結(jié)果本文發(fā)現(xiàn),與淺層學(xué)習(xí)模型相比,深度學(xué)習(xí)模型中的LSTM能夠存儲遠(yuǎn)距離的信息,CNN能夠提取不同維度的特征,這些功能能夠更好地挖掘出詞向量中隱藏的情感信息,而淺層學(xué)習(xí)模型在挖掘詞向量中隱藏的情感信息時損失了詞與詞之間的語義信息,這是淺層學(xué)習(xí)模型性能下降的一個主要原因。與Word2vec詞向量相比,Glove詞向量能夠利用全局統(tǒng)計信息,將更多的情感信息存儲到詞向量中,而Word2vec只能利用局部信息,因此Glove詞向量情感分類的效果要好于Word2vec詞向量。
[Abstract]:With the rapid development of the mobile Internet, netizens' enthusiasm for participating in hot social discussions has been rising. Weibo of Sina has become an important platform for netizens to express their views and express their feelings. The social network based on Sina Weibo largely reflects the social behavior and emotional tendency of Chinese people. How to quickly dig out the hidden emotional information in Sina Weibo and provide effective auxiliary information for government and enterprise decision-making is becoming the research hotspot in the field of natural language processing. Traditional affective analysis requires a lot of time to extract features from the data, and it often needs to be combined with grammar rules to get better results. But in big data's time, the amount of data is getting larger and larger. The difficulty of artificial feature extraction is increasing. In this paper, we propose to use word vector and depth learning combination to learn emotional information in data, in which unsupervised Word2vec and Glove models are used to train data into word vectors, and word vectors will replace the features extracted manually. This method saves manpower and uses the depth learning model to automatically learn the emotion information in the word vector. Finally, the comparison experiment shows that the depth learning model can achieve good results in the task of sentence level emotion analysis. In this paper, two kinds of word vectors are generated by training Weibo's comment data through Word2vec and Glove language models and input into shallow learning model (SVM,Logistic Regression,Naive Bayesian) and deep learning model (LSTM,CNN,LSTM CNN), respectively. The shallow learning model and the deep learning model obtain the hidden emotion information in the word vector and give the result of emotion classification. According to the accuracy of the statistical model and recall rate of the experimental results, the performance of the model is evaluated. The highest accuracy of shallow learning model is close to 78.1 percent, and that of depth learning model is close to 84.5 percent. By comparing the experimental results, it is found that compared with the shallow learning model, the LSTM in the deep learning model can store remote information and extract the features of different dimensions. These functions can better mine the hidden emotional information in the word vector, while the shallow learning model loses the semantic information between the word and the word when mining the hidden emotion information in the word vector. This is one of the main reasons for the performance degradation of the shallow learning model. Compared with Word2vec word vector, Glove word vector can use global statistical information and store more emotional information into word vector, while Word2vec can only use local information, so the effect of Glove word vector classification is better than Word2vec word vector.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP391.1;TP393.092
本文編號:2212639
[Abstract]:With the rapid development of the mobile Internet, netizens' enthusiasm for participating in hot social discussions has been rising. Weibo of Sina has become an important platform for netizens to express their views and express their feelings. The social network based on Sina Weibo largely reflects the social behavior and emotional tendency of Chinese people. How to quickly dig out the hidden emotional information in Sina Weibo and provide effective auxiliary information for government and enterprise decision-making is becoming the research hotspot in the field of natural language processing. Traditional affective analysis requires a lot of time to extract features from the data, and it often needs to be combined with grammar rules to get better results. But in big data's time, the amount of data is getting larger and larger. The difficulty of artificial feature extraction is increasing. In this paper, we propose to use word vector and depth learning combination to learn emotional information in data, in which unsupervised Word2vec and Glove models are used to train data into word vectors, and word vectors will replace the features extracted manually. This method saves manpower and uses the depth learning model to automatically learn the emotion information in the word vector. Finally, the comparison experiment shows that the depth learning model can achieve good results in the task of sentence level emotion analysis. In this paper, two kinds of word vectors are generated by training Weibo's comment data through Word2vec and Glove language models and input into shallow learning model (SVM,Logistic Regression,Naive Bayesian) and deep learning model (LSTM,CNN,LSTM CNN), respectively. The shallow learning model and the deep learning model obtain the hidden emotion information in the word vector and give the result of emotion classification. According to the accuracy of the statistical model and recall rate of the experimental results, the performance of the model is evaluated. The highest accuracy of shallow learning model is close to 78.1 percent, and that of depth learning model is close to 84.5 percent. By comparing the experimental results, it is found that compared with the shallow learning model, the LSTM in the deep learning model can store remote information and extract the features of different dimensions. These functions can better mine the hidden emotional information in the word vector, while the shallow learning model loses the semantic information between the word and the word when mining the hidden emotion information in the word vector. This is one of the main reasons for the performance degradation of the shallow learning model. Compared with Word2vec word vector, Glove word vector can use global statistical information and store more emotional information into word vector, while Word2vec can only use local information, so the effect of Glove word vector classification is better than Word2vec word vector.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP391.1;TP393.092
【參考文獻(xiàn)】
相關(guān)期刊論文 前7條
1 李華;屈丹;張文林;王炳錫;梁玉龍;;結(jié)合全局詞向量特征的循環(huán)神經(jīng)網(wǎng)絡(luò)語言模型[J];信號處理;2016年06期
2 陳強;何炎祥;劉續(xù)樂;孫松濤;彭敏;李飛;;基于句法分析的跨語言情感分析[J];北京大學(xué)學(xué)報(自然科學(xué)版);2014年01期
3 王振宇;吳澤衡;胡方濤;;基于HowNet和PMI的詞語情感極性計算[J];計算機(jī)工程;2012年15期
4 趙妍妍;秦兵;劉挺;;文本情感分析[J];軟件學(xué)報;2010年08期
5 熊德蘭;程菊明;田勝利;;基于HowNet的句子褒貶傾向性研究[J];計算機(jī)工程與應(yīng)用;2008年22期
6 唐慧豐;譚松波;程學(xué)旗;;基于監(jiān)督學(xué)習(xí)的中文情感分類技術(shù)比較研究[J];中文信息學(xué)報;2007年06期
7 張學(xué)工;關(guān)于統(tǒng)計學(xué)習(xí)理論與支持向量機(jī)[J];自動化學(xué)報;2000年01期
,本文編號:2212639
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2212639.html
最近更新
教材專著