基于SVM和概率神經(jīng)網(wǎng)絡(luò)多特征組合的在線產(chǎn)品評論情感信息挖掘
本文選題:SVM 切入點:概率神經(jīng)網(wǎng)絡(luò) 出處:《江蘇大學(xué)》2017年碩士論文 論文類型:學(xué)位論文
【摘要】:隨著互聯(lián)網(wǎng)的普及和電商技術(shù)的快速發(fā)展,人們越來越喜歡網(wǎng)上購物。相比與線下購物,網(wǎng)購具有便攜性,節(jié)省時間成本,受時間和空間的影響較小等特性。消費(fèi)者在網(wǎng)上購買商品前一般會瀏覽商品下方的評論信息,在購買商品后,發(fā)表對商品或服務(wù)的評價。在線產(chǎn)品評論的出現(xiàn)使得企業(yè)改進(jìn)產(chǎn)品質(zhì)量的時間點也發(fā)生了變化。傳統(tǒng)工業(yè)工程領(lǐng)域,企業(yè)改變產(chǎn)品質(zhì)量的時間點是在產(chǎn)品離開生產(chǎn)線之前,現(xiàn)在,企業(yè)可以在用戶使用產(chǎn)品之后,得到用戶對產(chǎn)品的反饋信息,或者在產(chǎn)品制造之前,提前了解用戶的真實需求,從而幫助企業(yè)理解消費(fèi)者,改善產(chǎn)品質(zhì)量。相比一些學(xué)者使用機(jī)器學(xué)習(xí)的方法來計算產(chǎn)品特征的情感值,本文更加關(guān)注文本評論的情感傾向,即識別文本所屬的情感類別,是正向的情感還是負(fù)向的情感。本文所處理的評論級別是子句級,最終使用SVM和概率神經(jīng)網(wǎng)絡(luò)兩種方法來識別子句的情感傾向,并比較結(jié)果。然后使用概率神經(jīng)網(wǎng)絡(luò)方法來預(yù)測子句的情感傾向,提取子句的產(chǎn)品屬性,進(jìn)行分類,得到消費(fèi)者在各產(chǎn)品屬性分類上情感分布情況。首先,以亞馬遜網(wǎng)站上華為honor暢玩版4X手機(jī)為例,設(shè)定其在線產(chǎn)品評論數(shù)據(jù)抓取規(guī)則,然后使用八爪魚采集器抓取在線評論數(shù)據(jù)。對抓取的數(shù)據(jù)進(jìn)行向量化處理。識別每條評論中的有效子句,對有效子句進(jìn)行分詞、去掉停用詞等預(yù)處理操作。根據(jù)相應(yīng)的詞典提取子句中情感詞、否定詞、程度副詞和特殊符號等特征。然后,根據(jù)以上特征組合構(gòu)建文本向量,使用SVM和概率神經(jīng)網(wǎng)絡(luò)兩種方法來來建模,并驗證模型的表現(xiàn)性能,判斷概率神經(jīng)網(wǎng)絡(luò)是否可以用于文本情感識別。每種方法中,根據(jù)特征的不同組合,又分為五組實驗,通過不同的實驗組合,根據(jù)實驗結(jié)果分析特征對文本情感識別的作用。最后,實驗結(jié)果表明:子句中情感詞數(shù)量和否定詞數(shù)量對文本的情感識別作用很強(qiáng),而程度副詞和特殊符號的作用比較微弱;其次,從模型的準(zhǔn)確度和運(yùn)行時間兩方面來分析,概率神經(jīng)網(wǎng)絡(luò)方法可以用于文本情感識別。接著,選用概率神經(jīng)網(wǎng)絡(luò)模型對實驗數(shù)據(jù)進(jìn)行分類預(yù)測,提取子句的產(chǎn)品屬性,對其進(jìn)行分類,得到消費(fèi)者在各產(chǎn)品屬性分類上情感分布情況,得到實驗結(jié)果表明:該手機(jī)在相機(jī)和屏幕兩個方面表現(xiàn)較差,企業(yè)可以在下代產(chǎn)品上改進(jìn)這兩方面。
[Abstract]:With the popularity of the Internet and the rapid development of e-commerce technology, people are more and more like online shopping. Compared with offline shopping, online shopping is portable and saves time cost. Less affected by time and space. Consumers generally browse the comments below the goods before buying them online, and after buying the goods, The appearance of online product reviews has also changed the point in which companies improve product quality. In traditional industrial engineering, the point in which companies change product quality is before the product leaves the production line. Now, enterprises can get feedback from users after they use the products, or they can understand the real needs of the users in advance before the products are manufactured, so as to help the enterprises understand the consumers. Improving product quality. Compared with some scholars using machine learning method to calculate the emotional value of product characteristics, this paper pays more attention to the emotional tendency of text review, that is, to identify the emotional category of text. The comment level is clause level, SVM and probabilistic neural network are used to identify the emotional tendency of clause. Then we use probabilistic neural network method to predict the emotional tendency of clauses, extract the product attributes of clauses, classify them, and get the distribution of consumers' emotions in the classification of product attributes. Take Huawei honor's 4X mobile phone on Amazon's website as an example, setting rules for its online product review data capture. Then we use the octopus collector to capture the online comment data. We vectorize the captured data. We identify the valid clauses in each comment, and segment the valid clauses. Remove preprocessing operations such as stop words. Extract features such as affective words, negative words, degree adverbs and special symbols in clauses according to the corresponding dictionaries. Then, construct text vectors according to the combination of the above features. SVM and probabilistic neural network are used to model the model, to verify the performance of the model, and to judge whether the probabilistic neural network can be used in text emotion recognition. In each method, according to the different combinations of features, it is divided into five groups of experiments. According to the experimental results, the effect of feature on text emotion recognition is analyzed through different experimental combinations. Finally, the experimental results show that the number of emotional words and the number of negative words in a clause have a strong effect on the emotional recognition of text. The function of degree adverb and special symbol is weak. Secondly, the probabilistic neural network method can be used in text emotion recognition from two aspects of model accuracy and running time. The probabilistic neural network model is used to classify and predict the experimental data, extract the product attributes of clauses, classify them, and obtain the distribution of consumer emotion in the classification of product attributes. The experimental results show that the performance of the mobile phone is poor in both camera and screen, and enterprises can improve these two aspects in the next generation of products.
【學(xué)位授予單位】:江蘇大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP183;F713.36
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 唐曉波;朱娟;楊豐華;;基于情感本體和kNN算法的在線評論情感分類研究[J];情報理論與實踐;2016年06期
2 丁晟春;王穎;李霄;;基于SVM的中文微博情緒分析研究[J];情報資料工作;2016年03期
3 李湘東;劉康;丁叢;高凡;;基于《知網(wǎng)》的多種類型文獻(xiàn)混合自動分類研究[J];現(xiàn)代圖書情報技術(shù);2016年02期
4 郭順利;張向先;;面向中文圖書評論的情感詞典構(gòu)建方法研究[J];現(xiàn)代圖書情報技術(shù);2016年02期
5 王冠群;田雪;黃德根;張婧;;中文微博觀點句識別及要素抽取研究[J];數(shù)據(jù)采集與處理;2016年01期
6 王明文;付翠琴;徐凡;洪歡;;基于詞項共現(xiàn)關(guān)系圖模型的中文觀點句識別研究[J];中文信息學(xué)報;2015年06期
7 黃挺;姬東鴻;;基于圖模型和多分類器的微博情感傾向性分析[J];計算機(jī)工程;2015年04期
8 李光敏;許新山;熊旭輝;;Web文本情感分析研究綜述[J];現(xiàn)代情報;2014年05期
9 李壽山;黃居仁;;基于Stacking組合分類方法的中文情感分類研究[J];中文信息學(xué)報;2010年05期
10 趙妍妍;秦兵;劉挺;;文本情感分析[J];軟件學(xué)報;2010年08期
相關(guān)碩士學(xué)位論文 前2條
1 李杏杏;B2C網(wǎng)站商品評論挖掘技術(shù)的研究[D];北京交通大學(xué);2014年
2 譚龍遠(yuǎn);基于領(lǐng)域的網(wǎng)絡(luò)爬蟲技術(shù)的研究與實現(xiàn)[D];武漢理工大學(xué);2009年
,本文編號:1644313
本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/1644313.html