基于非參數(shù)貝葉斯方法的情感主題模型構(gòu)建
發(fā)布時間:2018-07-06 10:52
本文選題:情感分析 + 細(xì)粒度; 參考:《西南科技大學(xué)》2016年碩士論文
【摘要】:隨著近幾年微博、博客、電子商務(wù)網(wǎng)站的興起,用戶的參與度和活躍度越來越高,針對熱銷商品、熱門新聞事件等產(chǎn)生了海量的評論信息。通過對這些文本進(jìn)行數(shù)據(jù)挖掘研究,可以得到用戶對于產(chǎn)品的評價、對社會事件的觀點,對于商家的產(chǎn)品研發(fā)、用戶的購買決策和政府的輿情監(jiān)控以及政策制定有著重要的價值和意義。因此,分析處理這些文本信息變得迫在眉睫,文本情感分析就是其中主要工作。本文對細(xì)粒度的情感分析進(jìn)行了研究,結(jié)合非參數(shù)貝葉斯方法,提出了一種面向產(chǎn)品屬性的用戶情感模型。主要的研究內(nèi)容包括以下幾個方面:首先,研究傳統(tǒng)情感模型在分析商品評論中的用戶情感時,發(fā)現(xiàn)面臨兩個主要問題:缺乏針對產(chǎn)品屬性的細(xì)粒度情感分析和自動提取的產(chǎn)品屬性其數(shù)量須提前確定。接著,提出了一種細(xì)粒度的面向產(chǎn)品屬性的用戶情感模型,首先利用分層狄利克雷過程將名詞實體聚類形成產(chǎn)品屬性并自動獲取其數(shù)量,然后結(jié)合產(chǎn)品屬性中名詞實體的權(quán)重和評價短語以及情感詞典作為先驗,利用潛在狄利克雷分布對產(chǎn)品屬性進(jìn)行情感分類。最后,通過采集淘寶和京東關(guān)于手機(jī)的評論數(shù)據(jù),選取蘋果手機(jī)評論作為實驗數(shù)據(jù)集。實驗結(jié)果表明該模型具有較高的情感分類準(zhǔn)確率,情感分類平均準(zhǔn)確率達(dá)87%。該模型與傳統(tǒng)的情感模型相比在抽取產(chǎn)品屬性和評價短語的情感分類上具有較高的準(zhǔn)確率。
[Abstract]:With the rise of Weibo, blog and e-commerce websites, the participation and activity of users are becoming higher and higher in recent years. Through the data mining research on these texts, we can get the evaluation of the product, the viewpoint of the social event, the product research and development of the business. User's purchase decision and government's public opinion monitoring and policy making have important value and significance. Therefore, analysis and processing of these text information become urgent, text emotional analysis is one of the main work. In this paper, the fine-grained emotion analysis is studied, and a user emotion model for product attributes is proposed based on the non-parametric Bayesian method. The main research contents include the following aspects: firstly, the traditional emotional model is used to analyze the user emotion in commodity reviews. It is found that there are two main problems: the lack of fine-grained emotional analysis for product attributes and the number of product attributes that need to be determined in advance. Then, a fine-grained user emotion model for product attributes is proposed. Firstly, the noun entities are clustered into product attributes and the number of product attributes is obtained automatically by using the hierarchical Drickley process. Then, combining the weight of the noun entity in the product attribute, the evaluation phrase and the emotion dictionary as a priori, we use the potential Delikley distribution to classify the product attribute. Finally, through collecting the data of Taobao and JingDong about the mobile phone, the author selects the comment of Apple phone as the experimental data set. The experimental results show that the model has a high accuracy of emotion classification, and the average accuracy of emotion classification is 87. Compared with the traditional emotion model, this model has a higher accuracy in extracting product attributes and evaluating phrase classification.
【學(xué)位授予單位】:西南科技大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2016
【分類號】:TP391.1
,
本文編號:2102546
本文鏈接:http://sikaile.net/jingjilunwen/dianzishangwulunwen/2102546.html
最近更新
教材專著