OTA網(wǎng)站評(píng)論文本的情感分析研究
本文選題:OTA 切入點(diǎn):情感分析 出處:《云南財(cái)經(jīng)大學(xué)》2017年碩士論文 論文類(lèi)型:學(xué)位論文
【摘要】:隨著經(jīng)濟(jì)的發(fā)展,人們對(duì)旅游的需求不斷增加,使得在線旅游消費(fèi)市場(chǎng)呈爆發(fā)式的增長(zhǎng)。因此,以攜程網(wǎng)、去哪兒網(wǎng)等為代表的旅游OTA網(wǎng)站積累了海量的用戶的評(píng)論文本數(shù)據(jù)。如何從這些大量的文本數(shù)據(jù)中獲取有用的信息以提高用戶體驗(yàn)成為亟需解決的問(wèn)題。本文針對(duì)OTA網(wǎng)站評(píng)論文本數(shù)據(jù)進(jìn)行了情感分析研究。具體工作如下:首先,本文通過(guò)網(wǎng)絡(luò)爬蟲(chóng)技術(shù)抓取了一定量旅游OTA網(wǎng)站的評(píng)論文本數(shù)據(jù)作為研究對(duì)象并構(gòu)建了相應(yīng)的分類(lèi)詞庫(kù)以及情感詞庫(kù)。由于旅游OTA網(wǎng)站評(píng)論文本的特殊性,目前的一些開(kāi)源的情感詞庫(kù)不能十分有效的匹配該類(lèi)評(píng)論文本數(shù)據(jù),而且主流的情感詞庫(kù)大部分是二分類(lèi)的情感判別,無(wú)法體現(xiàn)用戶具體的情感傾向程度;也不能將用戶的情感傾向按評(píng)價(jià)要素細(xì)分或者按用戶偏好個(gè)性化獲取情感傾向程度;诖吮疚尼槍(duì)該類(lèi)評(píng)論文本數(shù)據(jù)重新構(gòu)建了專有的按評(píng)價(jià)要素細(xì)分的分類(lèi)詞庫(kù)以及情感詞庫(kù)。從而能夠更有效地獲得用戶的情感傾向值。同時(shí)也是本文模型構(gòu)建中不可或缺的一部分。其次,本文提出一種基于LSA(潛在語(yǔ)義分析)與DBN(深度信念網(wǎng)絡(luò))的深度學(xué)習(xí)模型。由于傳統(tǒng)的基于文本向量空間所構(gòu)建的文本特征矩陣僅僅體現(xiàn)了文本信息中詞頻的信息,而未能包含詞語(yǔ)于詞語(yǔ)之間潛藏的語(yǔ)義信息(如:一詞多義或一義多詞等)所以在模型擬合過(guò)程中其效果往往有所欠缺。因此,本文運(yùn)用LSA方法將原始的文本特征矩陣進(jìn)行SVD分解,再通過(guò)合理的選取分解后奇異值的個(gè)數(shù)重構(gòu)文本特征矩陣。最后,基于重構(gòu)后的文本特征矩陣構(gòu)建了DBN的深度學(xué)習(xí)模型以期通過(guò)對(duì)文本數(shù)據(jù)的訓(xùn)練學(xué)習(xí)能夠有效地獲得文本的情感傾向值。最后,本文還設(shè)計(jì)了六組數(shù)據(jù)與模型的對(duì)比實(shí)驗(yàn)以驗(yàn)證模型的有效性。從最終各個(gè)模型十折交叉驗(yàn)證的總體對(duì)比結(jié)果來(lái)看,本文構(gòu)建的基于LSA(潛在語(yǔ)義分析)與DBN(深度信念網(wǎng)絡(luò))的深度學(xué)習(xí)模型具有較好的性能。
[Abstract]:With the development of economy, people's demand for tourism is increasing, which makes the online travel consumption market explosive. Therefore, to Ctrip, The travel OTA website, which is represented by the travel website, has accumulated a lot of comments text data from users. How to obtain useful information from these lots of text data in order to improve the user experience has become a problem that needs to be solved in this paper. This paper makes an emotional analysis on the text data of comments on OTA website. The specific work is as follows: first of all, This paper grabs a certain amount of comment text data of tourism OTA website as the research object through web crawler technology, and constructs the corresponding classifying lexicon and emotion lexicon. Because of the particularity of tourism OTA website comment text, At present some open source emotion lexicon can not match this kind of comment text data very effectively and the mainstream emotion lexicon is mostly two-classification emotion judgment which can not reflect the specific emotional tendency of the user. Nor can the emotional tendency of users be subdivided into evaluation elements or individualized by user preferences. Based on this this paper reconstructs a proprietary classification based on evaluation elements for this kind of comment text data. Thesaurus and affective lexicon can obtain the user's emotional tendency value more effectively. It is also an indispensable part of the model building in this paper. Secondly, In this paper, a deep learning model based on LSA (latent semantic Analysis) and DBN (Deep belief Network) is proposed. Because the traditional text feature matrix based on text vector space only embodies the information of word frequency in the text information. However, the semantic information hidden between words (such as polysemy or multi-word) is often lacking in the process of model fitting. In this paper, the original text feature matrix is decomposed by LSA method, then the text feature matrix is reconstructed by selecting the number of singular values reasonably. Based on the reconstructed text feature matrix, a depth learning model of DBN is constructed in order to obtain the emotional tendency of the text effectively by training the text data. In order to verify the validity of the model, six groups of data and model are designed to verify the validity of the model. The depth learning model based on LSA (latent semantic analysis) and DBN (Deep belief Network) has good performance.
【學(xué)位授予單位】:云南財(cái)經(jīng)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類(lèi)號(hào)】:TP391.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 李亞克;田青;高航;;結(jié)合類(lèi)標(biāo)簽關(guān)聯(lián)度的有序核判別回歸學(xué)習(xí)[J];數(shù)據(jù)采集與處理;2016年03期
2 劉賢友;孫丙宇;李文波;汪超永;;基于隱含語(yǔ)義索引的文本情感序列回歸方法[J];計(jì)算機(jī)系統(tǒng)應(yīng)用;2014年07期
3 張靖;金浩;;漢語(yǔ)詞語(yǔ)情感傾向自動(dòng)判斷研究[J];計(jì)算機(jī)工程;2010年23期
4 柳位平;朱艷輝;栗春亮;向華政;文志強(qiáng);;中文基礎(chǔ)情感詞詞典構(gòu)建方法研究[J];計(jì)算機(jī)應(yīng)用;2009年10期
5 廖祥文;曹冬林;方濱興;許洪波;程學(xué)旗;;基于概率推理模型的博客傾向性檢索研究[J];計(jì)算機(jī)研究與發(fā)展;2009年09期
6 李杰;曹謝東;余飛;;基于語(yǔ)義相似度計(jì)算的詞匯語(yǔ)義自動(dòng)分類(lèi)系統(tǒng)[J];計(jì)算機(jī)仿真;2008年08期
7 李鈍;喬保軍;曹元大;萬(wàn)月亮;;基于語(yǔ)義分析的詞匯傾向識(shí)別研究[J];模式識(shí)別與人工智能;2008年04期
8 葉強(qiáng);張紫瓊;羅振雄;;面向互聯(lián)網(wǎng)評(píng)論情感分析的中文主觀性自動(dòng)判別方法研究[J];信息系統(tǒng)學(xué)報(bào);2007年01期
9 唐慧豐;譚松波;程學(xué)旗;;基于監(jiān)督學(xué)習(xí)的中文情感分類(lèi)技術(shù)比較研究[J];中文信息學(xué)報(bào);2007年06期
10 劉美茹;;基于LSI和SVM的文本分類(lèi)研究[J];計(jì)算機(jī)工程;2007年15期
相關(guān)碩士學(xué)位論文 前1條
1 劉賢友;面向電子商務(wù)的評(píng)論文本情感分析研究[D];中國(guó)科學(xué)技術(shù)大學(xué);2015年
,本文編號(hào):1559390
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1559390.html