互聯(lián)網(wǎng)商品評(píng)論信息的情感分析研究
本文關(guān)鍵詞: 情感分析 商品評(píng)論 三支決策 互信息 分類器 出處:《東南大學(xué)》2016年碩士論文 論文類型:學(xué)位論文
【摘要】:隨著互聯(lián)網(wǎng)與電子商務(wù)的迅速發(fā)展,越來越多的人們習(xí)慣網(wǎng)絡(luò)購物,與此同時(shí),大量的互聯(lián)網(wǎng)商品與商品評(píng)論信息對(duì)于人們選擇合適的、性價(jià)比高的商品造成了一定的困擾。由此,對(duì)互聯(lián)網(wǎng)商品評(píng)論信息進(jìn)行情感分析顯得尤為重要。針對(duì)互聯(lián)網(wǎng)商品領(lǐng)域進(jìn)行的情感分析,以某種網(wǎng)絡(luò)商品的評(píng)論內(nèi)容為樣本,利用機(jī)器學(xué)習(xí)等方法自動(dòng)分析其情感傾向,發(fā)現(xiàn)人們對(duì)于該商品的褒貶意見和態(tài)度。本文的研究課題是互聯(lián)網(wǎng)商品評(píng)論信息的情感分析研究,主要目的是利用計(jì)算機(jī)技術(shù)分析網(wǎng)絡(luò)產(chǎn)品中大規(guī)模的評(píng)論文本,得出其情感傾向性,在方便消費(fèi)者選擇合適的產(chǎn)品的同時(shí),也幫助商家對(duì)產(chǎn)品有更好的了解和改善。論文主要從以下幾個(gè)方面展開研究工作。1.完成互聯(lián)網(wǎng)商品情感分析的預(yù)處理工作。選取某電商網(wǎng)站上相關(guān)電子產(chǎn)品作為研究對(duì)象,通過數(shù)據(jù)堂下載商品的評(píng)論數(shù)據(jù)。對(duì)獲取的評(píng)論數(shù)據(jù)進(jìn)行文本的預(yù)處理工作,主要包括文本的中文分詞、過濾、詞性標(biāo)注、數(shù)據(jù)清洗、數(shù)據(jù)分類等,為后續(xù)評(píng)論文本的情感分析做準(zhǔn)備。2.選取最優(yōu)詞性特征并提出一種改進(jìn)的基于正負(fù)相關(guān)比率的互信息特征選擇方法。特征的選取對(duì)情感分類起著決定性的作用,選取合適的特征有利于提高情感分類的準(zhǔn)確率。一方面,從詞性特征考慮,針對(duì)不同的文本特征主要包括情感詞、形容詞、副詞、動(dòng)詞以及情感語氣詞等,指出情感語氣詞對(duì)于情感分類具有較好的輔助作用,選取最優(yōu)詞性特征組合。另一方面,對(duì)于特征選擇方法進(jìn)行比較,指出傳統(tǒng)互信息選擇方法的不足之處,并提出一種改進(jìn)的基于正負(fù)相關(guān)比率的互信息特征選擇方法。通過實(shí)驗(yàn)表明本文提出的最優(yōu)詞性特征組合以及改進(jìn)的互信息特征選擇算法具有更好的分類性能。3.分析三支決策的理論并提出一種多決策加權(quán)混合分類器。三支決策在處理不確定性問題時(shí)具有更好的表現(xiàn)。基于三支決策思想本文提出了一種多決策加權(quán)混合分類器,給出其主要思想、相關(guān)規(guī)則及定義。分別使用樸素貝葉斯分類器和支持向量機(jī)分類器,設(shè)置各個(gè)分類器的最優(yōu)閾值,進(jìn)行兩次三支決策,對(duì)于邊界區(qū)域文本其分類由樸素貝葉斯分類器和支持向量機(jī)分類器概率加權(quán)投票決定。實(shí)驗(yàn)表明多決策加權(quán)混合分類器有助于提高情感分類的準(zhǔn)確率,具有一定的優(yōu)越性。
[Abstract]:With the rapid development of the Internet and e-commerce, more and more people are used to shopping online, at the same time, a large number of Internet goods and commodity review information for people to choose the appropriate, Goods with high performance-price ratio cause some troubles. Therefore, it is very important to conduct emotional analysis of Internet commodity comment information. In view of the emotional analysis carried out in the field of Internet commodities, the comment content of a certain online commodity is taken as a sample. By means of machine learning and other methods, we can automatically analyze their emotional tendency, and find out that people's opinions and attitudes toward this commodity are evaluated and disparaged. The research topic of this paper is the emotional analysis of Internet commodity comment information. The main purpose of this paper is to use computer technology to analyze the large scale comment text in the network product, and to find out its emotional tendency, while facilitating the consumers to choose the right product at the same time. It also helps the merchants to have a better understanding and improvement of the products. This paper mainly starts the research work from the following aspects. 1. Finish the preprocessing work of the Internet commodity emotion analysis. Select the related electronic products on a certain e-commerce website as the research object. Through the data hall download the commodity comment data. The text preprocessing work to the obtained comment data, mainly includes the text Chinese word segmentation, the filtering, the part of speech tagging, the data cleaning, the data classification and so on, To prepare for the emotional analysis of the following comment text. 2. To select the optimal part of speech feature and propose an improved mutual information feature selection method based on positive and negative correlation ratio. The selection of features plays a decisive role in emotion classification. On the one hand, considering the part of speech features, different text features mainly include affective words, adjectives, adverbs, verbs and emotional mood words, etc. It is pointed out that mood words have a good auxiliary effect on affective classification, and the optimal part of speech feature combination is selected. On the other hand, the comparison of feature selection methods is made, and the shortcomings of traditional mutual information selection methods are pointed out. An improved mutual information feature selection method based on positive and negative correlation ratio is proposed. The experiments show that the proposed optimal feature combination and the improved mutual information feature selection algorithm have better classification performance. 3. This paper analyzes the theory of three-branch decision making and proposes a multi-decision weighted hybrid classifier, which has better performance in dealing with uncertain problems. Based on the idea of three-branch decision making, a multi-decision weighted hybrid classifier is proposed in this paper. The main idea, relevant rules and definitions are given. Using naive Bayesian classifier and support vector machine classifier, the optimal threshold of each classifier is set, and the decision is made two times and three times. The classification of the text in the boundary region is decided by the naive Bayesian classifier and the support vector machine classifier probability weighted voting. The experiments show that the multi-decision weighted hybrid classifier can improve the accuracy of emotion classification and has some advantages.
【學(xué)位授予單位】:東南大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2016
【分類號(hào)】:TP391.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前9條
1 周哲;商琳;;一種基于動(dòng)態(tài)詞典和三支決策的情感分析方法[J];山東大學(xué)學(xué)報(bào)(工學(xué)版);2015年01期
2 楊立公;朱儉;湯世平;;文本情感分析綜述[J];計(jì)算機(jī)應(yīng)用;2013年06期
3 張靖;金浩;;漢語詞語情感傾向自動(dòng)判斷研究[J];計(jì)算機(jī)工程;2010年23期
4 趙妍妍;秦兵;劉挺;;文本情感分析[J];軟件學(xué)報(bào);2010年08期
5 張紫瓊;葉強(qiáng);李一軍;;互聯(lián)網(wǎng)商品評(píng)論情感分析研究綜述[J];管理科學(xué)學(xué)報(bào);2010年06期
6 柳位平;朱艷輝;栗春亮;向華政;文志強(qiáng);;中文基礎(chǔ)情感詞詞典構(gòu)建方法研究[J];計(jì)算機(jī)應(yīng)用;2009年10期
7 徐軍;丁宇新;王曉龍;;使用機(jī)器學(xué)習(xí)方法進(jìn)行新聞的情感自動(dòng)分類[J];中文信息學(xué)報(bào);2007年06期
8 徐琳宏;林鴻飛;楊志豪;;基于語義理解的文本傾向性識(shí)別機(jī)制[J];中文信息學(xué)報(bào);2007年01期
9 朱嫣嵐;閔錦;周雅倩;黃萱菁;吳立德;;基于HowNet的詞匯語義傾向計(jì)算[J];中文信息學(xué)報(bào);2006年01期
,本文編號(hào):1553221
本文鏈接:http://sikaile.net/jingjilunwen/dianzishangwulunwen/1553221.html