天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于中文在線評論的產(chǎn)品特征提取與情感分析研究

發(fā)布時間:2018-04-26 03:31

  本文選題:評論挖掘 + 特征提取; 參考:《東南大學(xué)》2016年碩士論文


【摘要】:隨著互聯(lián)網(wǎng)應(yīng)用的普及以及電子商務(wù)的迅速發(fā)展,網(wǎng)絡(luò)購物已經(jīng)成為人們普遍且重要的消費(fèi)方式。在線評論是電子商務(wù)網(wǎng)站上的一個重要的數(shù)據(jù)資產(chǎn),它們是用戶在網(wǎng)上購買產(chǎn)品后對產(chǎn)品發(fā)布的包含個人主觀或者客觀的態(tài)度及意見的文本集合,這些評論數(shù)據(jù)為網(wǎng)購用戶和商家提供了巨大的潛在價值。海量的在線評論依靠人工閱讀理解顯然無法實(shí)現(xiàn),評論挖掘技術(shù)的出現(xiàn)為解決這一問題提供了有效的解決手段并成為了國內(nèi)外學(xué)者研究的熱點(diǎn)。評論挖掘主要研究內(nèi)容包含特征提取和情感分析兩部分,本文圍繞中文在線評論挖掘的研究,開展了如下工作:1)構(gòu)建電子產(chǎn)品領(lǐng)域的中文在線評論資料庫。本文利用定制化的爬蟲工具來自動化抓取京東和淘寶的關(guān)于電子產(chǎn)品評論的html內(nèi)容,并進(jìn)行解析,然后采用本文提出的初始評論過濾標(biāo)準(zhǔn)對原始評論數(shù)據(jù)進(jìn)行過濾和清洗,采用中科院分詞工具進(jìn)行分詞,去停用詞后,統(tǒng)計(jì)詞頻存入到數(shù)據(jù)庫中,最后將經(jīng)過預(yù)處理的數(shù)據(jù)存入ES集群中。2)提出一種高效的基于中文在線評論二次剪枝算法來進(jìn)行特征提取。本文在傳統(tǒng)的序列模式挖掘算法的基礎(chǔ)上,針對其準(zhǔn)確率和召回率不夠高的問題,將傳統(tǒng)GSP算法與基于統(tǒng)計(jì)基礎(chǔ)的詞對共現(xiàn)度方法進(jìn)行結(jié)合,實(shí)現(xiàn)特征的提取和剪枝,得到的特征集合為后續(xù)的情感分析工作奠定基礎(chǔ)。3)中文句法模式的構(gòu)建。本文采用句法分析器對評論進(jìn)行句法解析,而后統(tǒng)計(jì)各個依存關(guān)系在語料庫中的頻率,通過對依存模式的研究,結(jié)合在線評論的特征,構(gòu)建了7個依存模式,并提出了一個基于語義距離和標(biāo)點(diǎn)的提取算法來提取特征及觀點(diǎn)組成的元組。最后,本文構(gòu)建了一個基于11個特征的分類特征模型,并采用SVM、邏輯回歸和貝葉斯算法作為分類器,與基線模型進(jìn)行多個實(shí)驗(yàn)比較。通過對特征的篩選和排序,本文最后獲得了5個與分類結(jié)果最相關(guān)的特征,實(shí)驗(yàn)結(jié)果表明了本文的方法的有效性和易用性。
[Abstract]:With the popularity of Internet applications and the rapid development of e-commerce, online shopping has become a common and important way of consumption. Online reviews are an important data asset on e-commerce websites. They are a collection of texts containing personal subjective or objective attitudes and opinions issued by users after purchasing products on the Internet. These comments provide huge potential value for online shopping users and merchants. It is obvious that massive online reviews can not be realized by manual reading comprehension. The emergence of comment mining technology has provided an effective solution to this problem and has become a hot research topic of scholars at home and abroad. Comment mining mainly includes feature extraction and emotion analysis. This paper focuses on the research of Chinese online comment mining, and develops the following work: 1) to construct the online review database of electronic products. In this paper, we use customized crawler tools to automatically capture and analyze the html content of electronic product reviews by JingDong and Taobao, and then use the initial comment filtering standard proposed in this paper to filter and clean the original comment data. After using the segmentation tool of the Chinese Academy of Sciences to stop the word, the statistical word frequency is stored in the database. Finally, the pre-processed data is stored in es cluster. 2) an efficient two-pruning algorithm based on Chinese online comment is proposed for feature extraction. In this paper, based on the traditional sequential pattern mining algorithm, aiming at the problem that the accuracy and recall rate are not high enough, the traditional GSP algorithm is combined with the cooccurrence degree method based on statistics to achieve feature extraction and pruning. The obtained feature sets lay the foundation for the subsequent affective analysis. 3) the construction of Chinese syntactic patterns. In this paper, the syntactic parser is used to parse the comments, and then the frequency of the dependencies in the corpus is counted. Through the study of the dependency patterns and the features of the online comments, seven dependency patterns are constructed. An algorithm based on semantic distance and punctuation is proposed to extract tuples composed of features and viewpoints. Finally, a classification feature model based on 11 features is constructed, and SVM, logical regression and Bayesian algorithms are used as classifiers to compare with the baseline model. Finally, five features which are most relevant to the classification results are obtained through the selection and sorting of the features. The experimental results show that the proposed method is effective and easy to use.
【學(xué)位授予單位】:東南大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2016
【分類號】:TP391.1

【參考文獻(xiàn)】

相關(guān)期刊論文 前8條

1 祖李軍;王衛(wèi)平;;中文網(wǎng)絡(luò)評論中提取產(chǎn)品特征的研究[J];計(jì)算機(jī)系統(tǒng)應(yīng)用;2014年05期

2 桂斌;楊小平;張中夏;肖文韜;;基于微博表情符號的情感詞典構(gòu)建研究[J];北京理工大學(xué)學(xué)報(bào);2014年05期

3 吳麗華;馮建平;曹均闊;;中文網(wǎng)絡(luò)評論的IT產(chǎn)品特征挖掘及情感傾向分析[J];計(jì)算機(jī)與數(shù)字工程;2012年11期

4 劉俊;鄒東升;邢欣來;李英豪;;基于主題特征的關(guān)鍵詞抽取[J];計(jì)算機(jī)應(yīng)用研究;2012年11期

5 王洪偉;鄭麗娟;尹裴;史偉;;在線評論的情感極性分類研究綜述[J];情報(bào)科學(xué);2012年08期

6 李實(shí);葉強(qiáng);李一軍;羅嗣卿;;挖掘中文網(wǎng)絡(luò)客戶評論的產(chǎn)品特征及情感傾向[J];計(jì)算機(jī)應(yīng)用研究;2010年08期

7 崔大志;孫麗偉;;在線評論情感詞匯模糊本體庫構(gòu)建[J];遼寧工程技術(shù)大學(xué)學(xué)報(bào)(社會科學(xué)版);2010年04期

8 婁德成;姚天f ;;漢語句子語義極性分析和觀點(diǎn)抽取方法的研究[J];計(jì)算機(jī)應(yīng)用;2006年11期

,

本文編號:1804319

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/jingjilunwen/dianzishangwulunwen/1804319.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶56f09***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com