基于SVM和深度學(xué)習(xí)的情感分類算法研究
本文選題:情感分析 + SVM; 參考:《重慶郵電大學(xué)》2016年碩士論文
【摘要】:互聯(lián)網(wǎng)的蓬勃發(fā)展,引發(fā)了傳統(tǒng)生活方式、商業(yè)經(jīng)濟(jì)結(jié)構(gòu)的巨大變革。從電子商務(wù)、社交軟件、再到打車軟件,處處標(biāo)示著互聯(lián)網(wǎng)的標(biāo)簽。人們通過微信、微博等社交工具聯(lián)絡(luò)他人、展示自己、發(fā)表評論。藉此產(chǎn)生了大量蘊(yùn)含著觀點(diǎn)和意見的數(shù)據(jù)信息,具有難以估量的價(jià)值,這使得文本大數(shù)據(jù)處理成為當(dāng)前非常熱門的領(lǐng)域。文本數(shù)據(jù)的情感分類是該領(lǐng)域相關(guān)研究的主要內(nèi)容之一,本文以文本情感信息分類為主要研究對象,針對目前中文情感分析領(lǐng)域,基于機(jī)器學(xué)習(xí)的相關(guān)研究在進(jìn)行特征提取時(shí)通;诮y(tǒng)計(jì)學(xué)的知識,存在對復(fù)雜句式有效分析能力不足和無法深入反映文本語義的問題進(jìn)行深入研究。針對復(fù)雜句式有效分析能力不足的問題,本文構(gòu)建了對應(yīng)各種復(fù)雜句式的特征提取規(guī)則,提出了基于SVM(Support Vector Machine)和復(fù)雜句式的文本情感分析方法。在實(shí)驗(yàn)中以情感詞、詞性和否定詞特征組合為基礎(chǔ),依次加入條件句式和轉(zhuǎn)折句式特征,并使用不同分類器及內(nèi)核進(jìn)行多次實(shí)驗(yàn),得到的最佳分類結(jié)果為90.12%。同時(shí),在實(shí)驗(yàn)中發(fā)現(xiàn)這類方法非常依賴人工設(shè)計(jì)的具體任務(wù)的特征,領(lǐng)域適應(yīng)性差,難以覆蓋所有的信息。針對無法深入反映文本語義與本文上述研究中發(fā)現(xiàn)的問題,本文引入了基于深度學(xué)習(xí)的Word2vec工具,它能訓(xùn)練出包含著深層語義信息的低維詞向量。研究中,使用Word2vec訓(xùn)練詞向量作為特征,融合TF-IDF(Term Frequency Inverse Document Frequency)訓(xùn)練的詞頻權(quán)重特征,使用SVM分類器獲得了理想效果。進(jìn)一步調(diào)整懲罰系數(shù)C,當(dāng)C=10的時(shí)候,獲得的最佳準(zhǔn)確度高達(dá)94.37%。同時(shí),本文還提出了詞向量融合Hash映射特征的方法,同樣取得了良好的分類性能。通過本文的研究,使用傳統(tǒng)的統(tǒng)計(jì)特征加上復(fù)雜句式特征,比單獨(dú)使用統(tǒng)計(jì)特征組合提高了7.16%的準(zhǔn)確度。本文進(jìn)一步引入深度學(xué)習(xí)思想,使用詞向量作為特征,在融合統(tǒng)計(jì)特征之后大幅增進(jìn)了情感分類的性能,準(zhǔn)確度比前者提高了4.25%,正面評論的評價(jià)指數(shù)都獲得大幅提升;谏鲜鲅芯,本文設(shè)計(jì)并實(shí)現(xiàn)了一個(gè)文本情感分析系統(tǒng),主要包括數(shù)據(jù)預(yù)處理、分詞、情感分類以及結(jié)果展示等功能。
[Abstract]:The vigorous development of the Internet has brought about great changes in the traditional way of life and the structure of the commercial economy. Everything from e-commerce to social software to car-hailing software is marked with Internet tags. People use WeChat, Weibo and other social tools to contact others, show themselves, and make comments. Therefore, a large amount of data information containing views and opinions is produced, which has inestimable value, which makes the text big data processing become a very popular field at present. The emotion classification of text data is one of the main contents of the related research in this field. In this paper, the text emotion information classification is taken as the main research object, aiming at the current Chinese emotion analysis field. The related research based on machine learning is usually based on the knowledge of statistics when it comes to feature extraction. There is a deep study on the problem that the efficient analysis ability of complex sentence patterns is insufficient and the semantic of text can not be deeply reflected. In order to solve the problem of the lack of effective analysis ability of complex sentence patterns, this paper constructs the feature extraction rules corresponding to various complex sentence patterns, and proposes a text emotion analysis method based on SVM(Support Vector Machine and complex sentence patterns. On the basis of the combination of affective words, parts of speech and negative words, the conditional sentence pattern and the turning sentence feature are added in turn, and many experiments are carried out with different classifiers and kernels. The best classification result is 90.12g. At the same time, it is found in the experiment that this kind of method is very dependent on the characteristics of the specific task designed by human being, and the adaptability of the field is poor, so it is difficult to cover all the information. In this paper, we introduce a Word2vec tool based on deep learning, which can train low-dimensional word vector with deep semantic information. In the study, the Word2vec training word vector is used as the feature, and the frequency weight feature of the TF-IDF(Term Frequency / Inverse Document frequency training is fused, and the ideal effect is obtained by using the SVM classifier. Further adjust the penalty coefficient C, when C10, obtain the best accuracy as high as 94.37. At the same time, the method of word vector fusion for Hash mapping features is proposed, and good classification performance is obtained. Through the research in this paper, the accuracy of using the traditional statistical feature and the complex sentence feature is 7.16% higher than that of using the statistical feature combination alone. This paper further introduces the idea of in-depth learning, using word vector as the feature, after merging the statistical features, the performance of emotion classification is greatly improved, the accuracy is 4.25% higher than the former, and the evaluation index of positive comments is greatly improved. Based on the above research, this paper designs and implements a text emotion analysis system, which includes data preprocessing, word segmentation, emotion classification and result display.
【學(xué)位授予單位】:重慶郵電大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2016
【分類號】:TP391.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 李鎮(zhèn)君;周竹榮;;基于Document Triage的TF-IDF算法的改進(jìn)[J];計(jì)算機(jī)應(yīng)用;2015年12期
2 李抵非;田地;胡雄偉;;基于深度學(xué)習(xí)的中文標(biāo)準(zhǔn)文獻(xiàn)語言模型[J];吉林大學(xué)學(xué)報(bào)(工學(xué)版);2015年02期
3 邸鵬;李愛萍;段利國;;基于轉(zhuǎn)折句式的文本情感傾向性分析[J];計(jì)算機(jī)工程與設(shè)計(jì);2014年12期
4 宋暉;;再論“轉(zhuǎn)折”[J];語文研究;2014年04期
5 李婷婷;姬東鴻;;基于SVM和CRF多特征組合的微博情感分析[J];計(jì)算機(jī)應(yīng)用研究;2015年04期
6 周詠梅;陽愛民;林江豪;;中文微博情感詞典構(gòu)建方法[J];山東大學(xué)學(xué)報(bào)(工學(xué)版);2014年03期
7 汪海燕;黎建輝;楊風(fēng)雷;;支持向量機(jī)理論及算法研究綜述[J];計(jì)算機(jī)應(yīng)用研究;2014年05期
8 張健軍;;轉(zhuǎn)折復(fù)句的定景機(jī)制及其研究意義[J];語言教學(xué)與研究;2014年02期
9 王振宇;吳澤衡;胡方濤;;基于HowNet和PMI的詞語情感極性計(jì)算[J];計(jì)算機(jī)工程;2012年15期
10 謝麗星;周明;孫茂松;;基于層次結(jié)構(gòu)的多策略中文微博情感分析和特征抽取[J];中文信息學(xué)報(bào);2012年01期
,本文編號:1845032
本文鏈接:http://sikaile.net/jingjilunwen/dianzishangwulunwen/1845032.html