B2C網(wǎng)站商品評論挖掘技術(shù)的研究
發(fā)布時間:2018-05-30 15:03
本文選題:商品評論 + 評論挖掘。 參考:《北京交通大學(xué)》2014年碩士論文
【摘要】:隨著B2C市場規(guī)模的增大,消費(fèi)者在互聯(lián)網(wǎng)上對商品的評論數(shù)量也呈爆炸式增長。由于這些商品評論中隱藏許多對商家和消費(fèi)者有價值的信息,因此準(zhǔn)確高效地識別這些信息并加以利用會帶來巨大的經(jīng)濟(jì)效益和廣闊的應(yīng)用前景,這使得商品評論的挖掘與分析成為近年來研究的熱點(diǎn)。本文以大型B2C網(wǎng)站京東商城的手機(jī)評論為研究對象,對商品評論文本的情感分類和情感極性分析兩方面進(jìn)行了研究,主要工作如下: 使用支持向量機(jī)方法和樸素貝葉斯方法對商品評論文本的情感分類進(jìn)行研究。首先對網(wǎng)上獲取的評論進(jìn)行人工選擇獲得訓(xùn)練集,然后利用NLPIR分詞系統(tǒng)預(yù)處理語料,并用TF-IDF方法計算特征詞的權(quán)重。最后,使用MI、IG、CHI特征選擇方法在分類器SVM、NB上進(jìn)行實驗對比分析。實驗結(jié)果表明,使用CHI特征提取方法,SVM和NB的分類效果能達(dá)到80%以上。另外,在同一特征提取方法上,SVM的分類效果要優(yōu)于NB,正確率可到83%。 采用基于鄰近原則的“雙向迭代法”對商品評論文本進(jìn)行細(xì)粒度情感極性分析。首先利用PMI-IR算法構(gòu)建情感種子集,然后利用基于鄰近原則的“雙向迭代法”獲取特征詞-情感詞關(guān)聯(lián)關(guān)系對,以此提出了一種情感詞典的構(gòu)建方法,構(gòu)建了一個基于HowNet的三元組情感詞典Tri-HowNet,并且通過實驗對比分析了基于HowNet極性詞典與基于Tri-HowNet情感詞典的兩種極性判定方法。實驗結(jié)果表明,后者在判定多語義情感詞極性時表現(xiàn)優(yōu)于前者。 設(shè)計并實現(xiàn)了基于SSH框架的評論挖掘系統(tǒng)。該系統(tǒng)主要包括詞典維護(hù)、評論收集、評論分類、評論情感分析和可視化展示等5個模塊。首先,利用開源:Java類庫Crawler4j提供的接口,通過post模擬登陸的方法來獲取評論。其次,由文本情感分類和情感分析兩個方向出發(fā),對商品評論進(jìn)行研究分析。最后,將結(jié)果存入商品的分析庫中,并能夠以3D柱狀圖的形式展現(xiàn),方便用戶查詢與使用。
[Abstract]:With the increase of B2C market scale, the number of consumers commenting on goods on the Internet is also increasing explosively. Because much valuable information is hidden in these commodity reviews, accurate and efficient identification and utilization of such information will bring great economic benefits and broad application prospects. This makes the mining and analysis of commodity reviews become the focus of research in recent years. This paper takes the mobile phone reviews of JingDong Mall, a large B2C website, as the research object, and studies the affective classification and the affective polarity analysis of the commodity review texts. The main work is as follows: Support vector machine (SVM) and naive Bayes method are used to study the emotion classification of commodity comment text. Firstly, the training set is obtained by manually selecting the comments obtained on the net, then the corpus is preprocessed by using the NLPIR word segmentation system, and the weight of the feature words is calculated by using the TF-IDF method. Finally, the feature selection method is used to compare and analyze the classifier SVMNB. The experimental results show that the classification effect of CHI and NB can reach more than 80%. In addition, the classification effect of SVM in the same feature extraction method is better than that of NB.The accuracy rate can reach 83%. A bidirectional iterative method based on proximity principle is used to analyze the fine-grained affective polarity of commodity review texts. Firstly, PMI-IR algorithm is used to construct the emotion seed set, then the "bidirectional iterative method" based on the proximity principle is used to obtain the associative pairs of feature words and affective words. A triple emotion dictionary Tri-HowNet based on HowNet is constructed, and two polarity determination methods based on HowNet polarity dictionary and Tri-HowNet emotion dictionary are compared and analyzed through experiments. The experimental results show that the latter performs better than the former in determining polarity of multi-semantic affective words. A comment mining system based on SSH framework is designed and implemented. The system mainly includes five modules: dictionary maintenance, comment collection, comment classification, comment emotion analysis and visual display. First of all, using the interface provided by the open source: Java class library Crawler4j, the method of simulating login by post is used to obtain comments. Secondly, from the two aspects of text emotion classification and emotion analysis, the article makes a research and analysis on commodity comment. Finally, the results are stored in the commodity analysis database, and can be displayed as 3D histogram, which is convenient for users to query and use.
【學(xué)位授予單位】:北京交通大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TP391.1;TP393.092
【參考文獻(xiàn)】
相關(guān)期刊論文 前5條
1 婁德成;姚天f ;;漢語句子語義極性分析和觀點(diǎn)抽取方法的研究[J];計算機(jī)應(yīng)用;2006年11期
2 唐慧豐;譚松波;程學(xué)旗;;基于監(jiān)督學(xué)習(xí)的中文情感分類技術(shù)比較研究[J];中文信息學(xué)報;2007年06期
3 徐軍;丁宇新;王曉龍;;使用機(jī)器學(xué)習(xí)方法進(jìn)行新聞的情感自動分類[J];中文信息學(xué)報;2007年06期
4 郗亞輝;張明;袁方;王煜;;產(chǎn)品評論挖掘研究綜述[J];山東大學(xué)學(xué)報(理學(xué)版);2011年05期
5 仇光;鄭淼;卜佳俊;史源;陳純;;基于傳播的產(chǎn)品屬性抽取[J];浙江大學(xué)學(xué)報(工學(xué)版);2010年11期
,本文編號:1955723
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/1955723.html
最近更新
教材專著