網(wǎng)購用戶評論中隱式評價對象的提取方法研究
發(fā)布時間:2019-03-16 11:20
【摘要】:在我國電子商務得到快速發(fā)展的同時,網(wǎng)購已經(jīng)深入人們?nèi)粘I?由于信息的不對稱性,使得消費者難以了解到商品的真實情況,而在線用戶評論為用戶的購買決策提供了參考意見,針對在線評論的意見挖掘也得到了廣大學者的青睞。評價對象作為意見挖掘領域的一個方面,也得到了廣泛的研究,而現(xiàn)有針對評價對象的研究主要集中在顯式評價對象的研究,很少有學者將隱式評價對象納入研究的考慮范圍。在研究領域,對于學者來說,針對隱式評價對象的研究能夠提高評價對象研究的準確率;對于企業(yè)來說,充分挖掘隱式評價對象,能夠使企業(yè)關注到隱藏在消費者評論中的意見對象,更為全面地認識到消費者對產(chǎn)品各個方面的使用體驗;對于消費者個人來說,電子商務平臺通過對隱式評價對象的抽取,使得展示或推薦給用戶的有效評論更加真實,消費者能夠獲得其他用戶對商品各方面更加精確的評論意見;诖,本文對用戶評論中的隱式評價對象進行了挖掘研究,主要工作包括以下幾方面:(1)數(shù)據(jù)預處理。通過數(shù)據(jù)抓取工具從淘寶網(wǎng)站上抓取用戶評論的真實數(shù)據(jù),然后對此文本數(shù)據(jù)進行分句、分詞、特征選擇、向量表示等處理。針對初始文本特征詞空間維度較高的問題,采用基于模擬退火的粒子群優(yōu)化算法對特征集進行二次特征提取,從而降低特征詞空間維度。實驗結果表明,采用該方法后,特征詞空間維度由425維降低到296維,該方法能夠進行有效的特征選擇。(2)顯式評價句的聚類分析。本文將評價句分為顯式評價句和隱式評價句,并對顯式評價句進行文本聚類研究。在用特征詞對評價句進行表示后,得到的文本向量空間維度依然很高,因此,本文采用適用于高維數(shù)據(jù)集的FCM聚類算法。針對FCM算法容易陷入局部最優(yōu)的特點,本文提出了基于模擬退火的FCM改進算法,通過對FCM算法迭代過程的控制,有效避免了算法陷入局部最優(yōu)。通過實驗將顯式評價句聚為9類,給每個類別設定類別名稱。實驗結果表明,基于模擬退火的FCM改進算法能夠對文本進行合理聚類。(3)隱式評價句評價對象提取。在對顯式評價句進行文本聚類之后,將同類別評價句歸為一個文檔集。由于評價句的評價對象、評價詞及類別之間存在某種映射關系,本文采用關聯(lián)規(guī)則算法來挖掘不同文檔集的關聯(lián)規(guī)則,并建立類別、評價對象、評價詞之間的關聯(lián)規(guī)則表,在該表的基礎上對隱式評價對象進行提取研究。通過對比實驗驗證,本文所提出的隱式評價對象提取方法的準確率達到75.26%,能夠有效提高文本分類的準確率。
[Abstract]:With the rapid development of e-commerce in China, online shopping has gone deep into people's daily life. Because of the asymmetry of information, it is difficult for consumers to understand the real situation of goods. The online user comments provide a reference for the purchase decision of users, and the opinion mining of online comments has also been favored by the majority of scholars. As an aspect of opinion mining, evaluation object has also been extensively studied, and the existing research on evaluation object is mainly focused on explicit evaluation object, and few scholars take implicit evaluation object into consideration. In the research field, for the scholars, the research on implicit evaluation object can improve the accuracy of the evaluation object research; For enterprises, fully mining implicit evaluation objects can make enterprises pay attention to the opinion objects hidden in consumers' comments, and realize consumers' experience in all aspects of products more comprehensively. For consumers, by extracting implicit evaluation objects, e-commerce platform makes the effective comments displayed or recommended to users more realistic, and consumers can obtain more accurate comments from other users on all aspects of goods. Based on this, this paper has carried on the mining research to the implicit evaluation object in the user comment. The main work includes the following aspects: (1) data preprocessing. The real data of user comments is captured from Taobao website by data crawling tool, and then the text data is processed such as sentence segmentation, word segmentation, feature selection, vector representation and so on. In order to solve the problem of high spatial dimension of feature words in initial text, particle swarm optimization (PSO) algorithm based on simulated annealing is used to extract the second feature of feature set, so as to reduce the dimension of feature space. The experimental results show that the spatial dimension of feature words is reduced from 425 dimension to 296 dimension, and this method can be used to select features effectively. (2) clustering analysis of explicit evaluation sentences. In this paper, evaluation sentences are divided into explicit evaluation sentences and implicit evaluation sentences, and text clustering of explicit evaluation sentences is carried out. After the evaluation sentence is represented by feature words, the dimension of text vector space is still very high. Therefore, the FCM clustering algorithm suitable for high-dimensional data sets is adopted in this paper. In view of the characteristic that FCM algorithm is easy to fall into local optimization, this paper proposes an improved FCM algorithm based on simulated annealing. By controlling the iterative process of FCM algorithm, the algorithm can effectively avoid falling into local optimization. Through experiments, explicit evaluation sentences are grouped into 9 categories, and each category is given a category name. The experimental results show that the improved FCM algorithm based on simulated annealing can reasonably cluster the text. (3) implicit evaluation object extraction. After text clustering of explicit evaluation sentences, the same category evaluation sentences are classified into a document set. Because there is some mapping relationship among the evaluation object, the evaluation word and the category of the evaluation sentence, this paper uses the association rule algorithm to mine the association rules of different document sets, and establishes the association rules table among the categories, the evaluation objects and the evaluation words. On the basis of this table, the implicit evaluation objects are extracted. The experimental results show that the accuracy of the implicit evaluation object extraction method proposed in this paper is 75.26%, which can effectively improve the accuracy of text classification.
【學位授予單位】:北京交通大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP391.1
本文編號:2441230
[Abstract]:With the rapid development of e-commerce in China, online shopping has gone deep into people's daily life. Because of the asymmetry of information, it is difficult for consumers to understand the real situation of goods. The online user comments provide a reference for the purchase decision of users, and the opinion mining of online comments has also been favored by the majority of scholars. As an aspect of opinion mining, evaluation object has also been extensively studied, and the existing research on evaluation object is mainly focused on explicit evaluation object, and few scholars take implicit evaluation object into consideration. In the research field, for the scholars, the research on implicit evaluation object can improve the accuracy of the evaluation object research; For enterprises, fully mining implicit evaluation objects can make enterprises pay attention to the opinion objects hidden in consumers' comments, and realize consumers' experience in all aspects of products more comprehensively. For consumers, by extracting implicit evaluation objects, e-commerce platform makes the effective comments displayed or recommended to users more realistic, and consumers can obtain more accurate comments from other users on all aspects of goods. Based on this, this paper has carried on the mining research to the implicit evaluation object in the user comment. The main work includes the following aspects: (1) data preprocessing. The real data of user comments is captured from Taobao website by data crawling tool, and then the text data is processed such as sentence segmentation, word segmentation, feature selection, vector representation and so on. In order to solve the problem of high spatial dimension of feature words in initial text, particle swarm optimization (PSO) algorithm based on simulated annealing is used to extract the second feature of feature set, so as to reduce the dimension of feature space. The experimental results show that the spatial dimension of feature words is reduced from 425 dimension to 296 dimension, and this method can be used to select features effectively. (2) clustering analysis of explicit evaluation sentences. In this paper, evaluation sentences are divided into explicit evaluation sentences and implicit evaluation sentences, and text clustering of explicit evaluation sentences is carried out. After the evaluation sentence is represented by feature words, the dimension of text vector space is still very high. Therefore, the FCM clustering algorithm suitable for high-dimensional data sets is adopted in this paper. In view of the characteristic that FCM algorithm is easy to fall into local optimization, this paper proposes an improved FCM algorithm based on simulated annealing. By controlling the iterative process of FCM algorithm, the algorithm can effectively avoid falling into local optimization. Through experiments, explicit evaluation sentences are grouped into 9 categories, and each category is given a category name. The experimental results show that the improved FCM algorithm based on simulated annealing can reasonably cluster the text. (3) implicit evaluation object extraction. After text clustering of explicit evaluation sentences, the same category evaluation sentences are classified into a document set. Because there is some mapping relationship among the evaluation object, the evaluation word and the category of the evaluation sentence, this paper uses the association rule algorithm to mine the association rules of different document sets, and establishes the association rules table among the categories, the evaluation objects and the evaluation words. On the basis of this table, the implicit evaluation objects are extracted. The experimental results show that the accuracy of the implicit evaluation object extraction method proposed in this paper is 75.26%, which can effectively improve the accuracy of text classification.
【學位授予單位】:北京交通大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP391.1
【引證文獻】
相關期刊論文 前1條
1 韓忠明;李夢琪;劉雯;張夢玫;段大高;于重重;;網(wǎng)絡評論方面級觀點挖掘方法研究綜述[J];軟件學報;2018年02期
,本文編號:2441230
本文鏈接:http://sikaile.net/jingjilunwen/dianzishangwulunwen/2441230.html
最近更新
教材專著