基于意見詞的隱性產(chǎn)品特征提取方法研究及應(yīng)用
本文選題:意見詞 + 隱性產(chǎn)品特征; 參考:《東華大學(xué)》2016年碩士論文
【摘要】:目前,電子商務(wù)發(fā)展日益迅速,電商平臺上每天都會有大量包括購買記錄、產(chǎn)品評論等在內(nèi)的數(shù)據(jù)信息,其中充分分析評論信息以得出用戶對產(chǎn)品的情感傾向,將會給商家以及其他用戶帶來一定的參考價值。但是,僅僅知道用戶的情感傾向,卻無法得知用戶是對產(chǎn)品哪一個特征進(jìn)行的評論將會導(dǎo)致商家不知如何改進(jìn)產(chǎn)品、其他用戶無法對比選擇。因此,為了使評論分析更加細(xì);,基于意見詞的特征挖掘研究是很有必要的。產(chǎn)品特征可分為顯性產(chǎn)品特征和隱性產(chǎn)品特征,目前,大多數(shù)研究都關(guān)注顯性產(chǎn)品特征的提取,而隱性產(chǎn)品特征的提取研究關(guān)注度較少;谝陨媳尘,本文以隱性產(chǎn)品特征提取為研究目標(biāo),以評論集為研究對象。本文的工作內(nèi)容可概括如下:(1)本文針對現(xiàn)有的提取有效詞的方法只考慮詞頻一個方面,考慮不夠全面的問題,提出了綜合加權(quán)的方法建立意見詞和上下文詞庫。本文提出了一種詞庫建立的算法,該算法綜合考慮了四個影響詞的有效性的方面,并加權(quán)它們的權(quán)值。實驗表明,此方法可以提高意見詞和上下文詞庫建立的準(zhǔn)確性。(2)本文針對現(xiàn)有的基于上下文的算法只考慮同一個評論句的上下文具有一定的片面性這個問題,提出了主題-意見詞聯(lián)合模型(JTO),通過在LDA模型中加入上下文層級來得到意見詞在全部評論集中的上下文概率分布。實驗表明,這種方法在隱性產(chǎn)品特征提取的準(zhǔn)確性上比現(xiàn)有的基于上下文的隱性產(chǎn)品特征提取方法有更好的表現(xiàn)。(3)本文針對評論句中上下文信息不一定可信的問題,提出了考慮上下文權(quán)重的提取方法,以評估上下文信息的可信度,本方法的內(nèi)容包括考慮詞之間的距離改進(jìn)共現(xiàn)矩陣、基于兩個概率分布和余弦相似度計算上下文權(quán)重。實驗表明,本方法在召回率和準(zhǔn)確率方面都有了一定的提升。(4)本文基于上述兩個模型實現(xiàn)了基于意見詞的隱性產(chǎn)品特征提取原型系統(tǒng),并將該系統(tǒng)應(yīng)用于榮華餅家項目。
[Abstract]:At present, with the rapid development of electronic commerce, there are a large number of data information including purchase records, product reviews and so on on the e-commerce platform every day, in which the comments are fully analyzed in order to get the emotional tendency of the users towards the products.Will bring certain reference value to merchants and other users.However, just knowing the emotional tendency of the user, but not knowing which feature the user is commenting on, will lead to the merchants do not know how to improve the product, other users can not compare the choice.Therefore, in order to make comment analysis more fine-grained, it is necessary to study feature mining based on comment words.Product features can be divided into dominant product features and hidden product features. At present, most researches focus on the extraction of dominant product features, but less attention is paid on the extraction of hidden product features.Based on the above background, this paper takes recessive product feature extraction as the research object and comments set as the research object.The work of this paper can be summarized as follows: (1) in view of the existing methods of extracting valid words, only one aspect of word frequency is considered, and the problem is not comprehensive enough, a comprehensive weighted method is proposed to establish the word base of opinion and context.In this paper, a lexicon building algorithm is proposed, which synthetically considers four aspects that affect the validity of words and weights their weights.The experimental results show that this method can improve the accuracy of the construction of comment words and context lexicon.) aiming at the problem that the existing context-based algorithms only consider the one-sidedness of the context of the same comment sentence,In this paper, a topic-opinion word association model is proposed, and the context probability distribution of the comment word in the whole comment set is obtained by adding context level to the LDA model.Experimental results show that this method has better performance than the existing context-based implicit product feature extraction methods.) this paper aims at the problem that the contextual information in comment sentences is not always credible.In order to evaluate the credibility of context information, a method of extracting context weight is proposed. The content of this method includes considering the distance between words to improve co-occurrence matrix and calculating context weight based on two probability distributions and cosine similarity.Experiments show that this method has a certain improvement in recall rate and accuracy.) based on the above two models, this paper implements a prototype system of recessive product feature extraction based on opinion words, and applies the system to Rong Hua pie house project.
【學(xué)位授予單位】:東華大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2016
【分類號】:TP391.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前9條
1 祖李軍;王衛(wèi)平;;中文網(wǎng)絡(luò)評論中提取產(chǎn)品特征的研究[J];計算機系統(tǒng)應(yīng)用;2014年05期
2 徐學(xué)可;程學(xué)旗;譚松波;劉悅;沈華偉;;面向顧客點評數(shù)據(jù)的屬性層次觀點挖掘研究(英文)[J];中國通信;2013年03期
3 李俊;;面向產(chǎn)品評論的意見挖掘研究綜述[J];現(xiàn)代計算機;2013年07期
4 葉春蕾;冷伏海;;基于概率模型的主題識別方法實證研究[J];情報科學(xué);2013年02期
5 張培晶;宋蕾;;基于LDA的微博文本主題建模方法研究述評[J];圖書情報工作;2012年24期
6 李芳;何婷婷;宋樂;;評價主題挖掘及其傾向性識別[J];計算機科學(xué);2012年06期
7 趙妍妍;秦兵;車萬翔;劉挺;;基于句法路徑的情感評價單元識別[J];軟件學(xué)報;2011年05期
8 宋曉雷;王素格;李紅霞;;面向特定領(lǐng)域的產(chǎn)品評價對象自動識別研究[J];中文信息學(xué)報;2010年01期
9 錢愛兵;江嵐;;基于改進(jìn)TF-IDF的中文網(wǎng)頁關(guān)鍵詞抽取——以新聞網(wǎng)頁為例[J];情報理論與實踐;2008年06期
相關(guān)會議論文 前1條
1 姚天f ;聶青陽;李建超;李林琳;婁德成;陳珂;付宇;;一個用于漢語汽車評論的意見挖掘系統(tǒng)[A];中文信息處理前沿進(jìn)展——中國中文信息學(xué)會二十五周年學(xué)術(shù)會議論文集[C];2006年
相關(guān)碩士學(xué)位論文 前2條
1 李盛;融入隱式產(chǎn)品特征提取的意見挖掘研究[D];東華大學(xué);2015年
2 黃億華;基于情感評價單元的商品評論分析研究[D];南京大學(xué);2011年
,本文編號:1745155
本文鏈接:http://sikaile.net/jingjilunwen/dianzishangwulunwen/1745155.html