網(wǎng)購(gòu)用戶評(píng)論中隱式評(píng)價(jià)對(duì)象的提取方法研究
發(fā)布時(shí)間:2019-03-16 11:20
【摘要】:在我國(guó)電子商務(wù)得到快速發(fā)展的同時(shí),網(wǎng)購(gòu)已經(jīng)深入人們?nèi)粘I?由于信息的不對(duì)稱性,使得消費(fèi)者難以了解到商品的真實(shí)情況,而在線用戶評(píng)論為用戶的購(gòu)買決策提供了參考意見,針對(duì)在線評(píng)論的意見挖掘也得到了廣大學(xué)者的青睞。評(píng)價(jià)對(duì)象作為意見挖掘領(lǐng)域的一個(gè)方面,也得到了廣泛的研究,而現(xiàn)有針對(duì)評(píng)價(jià)對(duì)象的研究主要集中在顯式評(píng)價(jià)對(duì)象的研究,很少有學(xué)者將隱式評(píng)價(jià)對(duì)象納入研究的考慮范圍。在研究領(lǐng)域,對(duì)于學(xué)者來說,針對(duì)隱式評(píng)價(jià)對(duì)象的研究能夠提高評(píng)價(jià)對(duì)象研究的準(zhǔn)確率;對(duì)于企業(yè)來說,充分挖掘隱式評(píng)價(jià)對(duì)象,能夠使企業(yè)關(guān)注到隱藏在消費(fèi)者評(píng)論中的意見對(duì)象,更為全面地認(rèn)識(shí)到消費(fèi)者對(duì)產(chǎn)品各個(gè)方面的使用體驗(yàn);對(duì)于消費(fèi)者個(gè)人來說,電子商務(wù)平臺(tái)通過對(duì)隱式評(píng)價(jià)對(duì)象的抽取,使得展示或推薦給用戶的有效評(píng)論更加真實(shí),消費(fèi)者能夠獲得其他用戶對(duì)商品各方面更加精確的評(píng)論意見;诖,本文對(duì)用戶評(píng)論中的隱式評(píng)價(jià)對(duì)象進(jìn)行了挖掘研究,主要工作包括以下幾方面:(1)數(shù)據(jù)預(yù)處理。通過數(shù)據(jù)抓取工具從淘寶網(wǎng)站上抓取用戶評(píng)論的真實(shí)數(shù)據(jù),然后對(duì)此文本數(shù)據(jù)進(jìn)行分句、分詞、特征選擇、向量表示等處理。針對(duì)初始文本特征詞空間維度較高的問題,采用基于模擬退火的粒子群優(yōu)化算法對(duì)特征集進(jìn)行二次特征提取,從而降低特征詞空間維度。實(shí)驗(yàn)結(jié)果表明,采用該方法后,特征詞空間維度由425維降低到296維,該方法能夠進(jìn)行有效的特征選擇。(2)顯式評(píng)價(jià)句的聚類分析。本文將評(píng)價(jià)句分為顯式評(píng)價(jià)句和隱式評(píng)價(jià)句,并對(duì)顯式評(píng)價(jià)句進(jìn)行文本聚類研究。在用特征詞對(duì)評(píng)價(jià)句進(jìn)行表示后,得到的文本向量空間維度依然很高,因此,本文采用適用于高維數(shù)據(jù)集的FCM聚類算法。針對(duì)FCM算法容易陷入局部最優(yōu)的特點(diǎn),本文提出了基于模擬退火的FCM改進(jìn)算法,通過對(duì)FCM算法迭代過程的控制,有效避免了算法陷入局部最優(yōu)。通過實(shí)驗(yàn)將顯式評(píng)價(jià)句聚為9類,給每個(gè)類別設(shè)定類別名稱。實(shí)驗(yàn)結(jié)果表明,基于模擬退火的FCM改進(jìn)算法能夠?qū)ξ谋具M(jìn)行合理聚類。(3)隱式評(píng)價(jià)句評(píng)價(jià)對(duì)象提取。在對(duì)顯式評(píng)價(jià)句進(jìn)行文本聚類之后,將同類別評(píng)價(jià)句歸為一個(gè)文檔集。由于評(píng)價(jià)句的評(píng)價(jià)對(duì)象、評(píng)價(jià)詞及類別之間存在某種映射關(guān)系,本文采用關(guān)聯(lián)規(guī)則算法來挖掘不同文檔集的關(guān)聯(lián)規(guī)則,并建立類別、評(píng)價(jià)對(duì)象、評(píng)價(jià)詞之間的關(guān)聯(lián)規(guī)則表,在該表的基礎(chǔ)上對(duì)隱式評(píng)價(jià)對(duì)象進(jìn)行提取研究。通過對(duì)比實(shí)驗(yàn)驗(yàn)證,本文所提出的隱式評(píng)價(jià)對(duì)象提取方法的準(zhǔn)確率達(dá)到75.26%,能夠有效提高文本分類的準(zhǔn)確率。
[Abstract]:With the rapid development of e-commerce in China, online shopping has gone deep into people's daily life. Because of the asymmetry of information, it is difficult for consumers to understand the real situation of goods. The online user comments provide a reference for the purchase decision of users, and the opinion mining of online comments has also been favored by the majority of scholars. As an aspect of opinion mining, evaluation object has also been extensively studied, and the existing research on evaluation object is mainly focused on explicit evaluation object, and few scholars take implicit evaluation object into consideration. In the research field, for the scholars, the research on implicit evaluation object can improve the accuracy of the evaluation object research; For enterprises, fully mining implicit evaluation objects can make enterprises pay attention to the opinion objects hidden in consumers' comments, and realize consumers' experience in all aspects of products more comprehensively. For consumers, by extracting implicit evaluation objects, e-commerce platform makes the effective comments displayed or recommended to users more realistic, and consumers can obtain more accurate comments from other users on all aspects of goods. Based on this, this paper has carried on the mining research to the implicit evaluation object in the user comment. The main work includes the following aspects: (1) data preprocessing. The real data of user comments is captured from Taobao website by data crawling tool, and then the text data is processed such as sentence segmentation, word segmentation, feature selection, vector representation and so on. In order to solve the problem of high spatial dimension of feature words in initial text, particle swarm optimization (PSO) algorithm based on simulated annealing is used to extract the second feature of feature set, so as to reduce the dimension of feature space. The experimental results show that the spatial dimension of feature words is reduced from 425 dimension to 296 dimension, and this method can be used to select features effectively. (2) clustering analysis of explicit evaluation sentences. In this paper, evaluation sentences are divided into explicit evaluation sentences and implicit evaluation sentences, and text clustering of explicit evaluation sentences is carried out. After the evaluation sentence is represented by feature words, the dimension of text vector space is still very high. Therefore, the FCM clustering algorithm suitable for high-dimensional data sets is adopted in this paper. In view of the characteristic that FCM algorithm is easy to fall into local optimization, this paper proposes an improved FCM algorithm based on simulated annealing. By controlling the iterative process of FCM algorithm, the algorithm can effectively avoid falling into local optimization. Through experiments, explicit evaluation sentences are grouped into 9 categories, and each category is given a category name. The experimental results show that the improved FCM algorithm based on simulated annealing can reasonably cluster the text. (3) implicit evaluation object extraction. After text clustering of explicit evaluation sentences, the same category evaluation sentences are classified into a document set. Because there is some mapping relationship among the evaluation object, the evaluation word and the category of the evaluation sentence, this paper uses the association rule algorithm to mine the association rules of different document sets, and establishes the association rules table among the categories, the evaluation objects and the evaluation words. On the basis of this table, the implicit evaluation objects are extracted. The experimental results show that the accuracy of the implicit evaluation object extraction method proposed in this paper is 75.26%, which can effectively improve the accuracy of text classification.
【學(xué)位授予單位】:北京交通大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP391.1
本文編號(hào):2441230
[Abstract]:With the rapid development of e-commerce in China, online shopping has gone deep into people's daily life. Because of the asymmetry of information, it is difficult for consumers to understand the real situation of goods. The online user comments provide a reference for the purchase decision of users, and the opinion mining of online comments has also been favored by the majority of scholars. As an aspect of opinion mining, evaluation object has also been extensively studied, and the existing research on evaluation object is mainly focused on explicit evaluation object, and few scholars take implicit evaluation object into consideration. In the research field, for the scholars, the research on implicit evaluation object can improve the accuracy of the evaluation object research; For enterprises, fully mining implicit evaluation objects can make enterprises pay attention to the opinion objects hidden in consumers' comments, and realize consumers' experience in all aspects of products more comprehensively. For consumers, by extracting implicit evaluation objects, e-commerce platform makes the effective comments displayed or recommended to users more realistic, and consumers can obtain more accurate comments from other users on all aspects of goods. Based on this, this paper has carried on the mining research to the implicit evaluation object in the user comment. The main work includes the following aspects: (1) data preprocessing. The real data of user comments is captured from Taobao website by data crawling tool, and then the text data is processed such as sentence segmentation, word segmentation, feature selection, vector representation and so on. In order to solve the problem of high spatial dimension of feature words in initial text, particle swarm optimization (PSO) algorithm based on simulated annealing is used to extract the second feature of feature set, so as to reduce the dimension of feature space. The experimental results show that the spatial dimension of feature words is reduced from 425 dimension to 296 dimension, and this method can be used to select features effectively. (2) clustering analysis of explicit evaluation sentences. In this paper, evaluation sentences are divided into explicit evaluation sentences and implicit evaluation sentences, and text clustering of explicit evaluation sentences is carried out. After the evaluation sentence is represented by feature words, the dimension of text vector space is still very high. Therefore, the FCM clustering algorithm suitable for high-dimensional data sets is adopted in this paper. In view of the characteristic that FCM algorithm is easy to fall into local optimization, this paper proposes an improved FCM algorithm based on simulated annealing. By controlling the iterative process of FCM algorithm, the algorithm can effectively avoid falling into local optimization. Through experiments, explicit evaluation sentences are grouped into 9 categories, and each category is given a category name. The experimental results show that the improved FCM algorithm based on simulated annealing can reasonably cluster the text. (3) implicit evaluation object extraction. After text clustering of explicit evaluation sentences, the same category evaluation sentences are classified into a document set. Because there is some mapping relationship among the evaluation object, the evaluation word and the category of the evaluation sentence, this paper uses the association rule algorithm to mine the association rules of different document sets, and establishes the association rules table among the categories, the evaluation objects and the evaluation words. On the basis of this table, the implicit evaluation objects are extracted. The experimental results show that the accuracy of the implicit evaluation object extraction method proposed in this paper is 75.26%, which can effectively improve the accuracy of text classification.
【學(xué)位授予單位】:北京交通大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP391.1
【引證文獻(xiàn)】
相關(guān)期刊論文 前1條
1 韓忠明;李夢(mèng)琪;劉雯;張夢(mèng)玫;段大高;于重重;;網(wǎng)絡(luò)評(píng)論方面級(jí)觀點(diǎn)挖掘方法研究綜述[J];軟件學(xué)報(bào);2018年02期
,本文編號(hào):2441230
本文鏈接:http://sikaile.net/jingjilunwen/dianzishangwulunwen/2441230.html
最近更新
教材專著