基于在線評(píng)論的個(gè)性化推薦研究
本文選題:在線評(píng)論 切入點(diǎn):LDA主題模型 出處:《南京財(cái)經(jīng)大學(xué)》2016年碩士論文 論文類型:學(xué)位論文
【摘要】:伴隨著互聯(lián)網(wǎng)時(shí)代的飛速前進(jìn),我們的周圍充斥著信息量巨大的網(wǎng)絡(luò)信息,而這些信息也在生活中扮演著日趨重要的角色。尤其是在電子商務(wù)領(lǐng)域,人們每天都要進(jìn)行購(gòu)物消費(fèi),產(chǎn)生了大量的產(chǎn)品信息和評(píng)論信息。如果能夠從海量的文字信息中獲取有價(jià)值的內(nèi)容,就可以極大地提升消費(fèi)者的購(gòu)物體驗(yàn),促進(jìn)商品成交率。這非但是在學(xué)術(shù)領(lǐng)域,而且也在商業(yè)應(yīng)用方面掀起了一股研究的熱潮。推薦系統(tǒng)通過探索用戶在過去發(fā)生的行為數(shù)據(jù),以及這些行為和產(chǎn)品自身屬性之間的相關(guān)性,實(shí)現(xiàn)模型的建立,達(dá)到用已發(fā)生的行為來預(yù)測(cè)未來行為的目的。簡(jiǎn)單地說,在實(shí)際應(yīng)用中,就是通過推薦用戶可能出現(xiàn)興趣點(diǎn)的各類產(chǎn)品,來實(shí)現(xiàn)業(yè)務(wù)量的增長(zhǎng)。以往的推薦系統(tǒng)主要將重心放在基于內(nèi)容的推薦方法上,將其他產(chǎn)品和用戶曾經(jīng)購(gòu)買或選擇過的產(chǎn)品進(jìn)行屬性特征的對(duì)比,若相似程度較高則予以推薦。本文在此基礎(chǔ)上,不僅考慮了產(chǎn)品本身的描述屬性,又綜合考慮了評(píng)分和評(píng)論等信息,提高了推薦的準(zhǔn)確率。本文首先需要利用網(wǎng)絡(luò)爬蟲對(duì)產(chǎn)品信息進(jìn)行采集,并將采集到的評(píng)論文本進(jìn)行分詞等預(yù)處理工作,經(jīng)過預(yù)處理后的詞語就構(gòu)成了一個(gè)詞典集合。由于特征詞數(shù)量龐大,本文運(yùn)用了改進(jìn)的LDA主題模型進(jìn)行特征提取,結(jié)合TF-IDF計(jì)算,綜合選取不同粒度下的特征,挖掘主題信息,計(jì)算出文本在各個(gè)主題上的概率分布和權(quán)重。最后,本文結(jié)合用戶興趣模型,使用sigmoid函數(shù),改善冷啟動(dòng)環(huán)境下產(chǎn)品相似度計(jì)算時(shí)從屬性特征到評(píng)論特征的過渡,采用歐幾里得距離公式對(duì)各文本之間的相似度進(jìn)行計(jì)算,將相似度較高的產(chǎn)品作為推薦列表輸出并進(jìn)行推薦。本文將亞馬遜中文網(wǎng)站上的圖書信息作為實(shí)驗(yàn)數(shù)據(jù)進(jìn)行實(shí)驗(yàn)分析,本文在實(shí)驗(yàn)的過程中還討論了當(dāng)主題數(shù)量發(fā)生變化時(shí),對(duì)于文本在主題上的概率分布的影響。另外,本文對(duì)選取不同特征項(xiàng)以及采用不同特征提取方法的推薦性能指標(biāo)進(jìn)行了評(píng)價(jià),主要包括準(zhǔn)確率、召回率以及F-Measure指標(biāo)。在對(duì)實(shí)驗(yàn)結(jié)果分別觀察后可以看出,與傳統(tǒng)的推薦方法相比較而言,本文選用的方法在考慮了評(píng)論文本信息并改進(jìn)后,推薦效果更為準(zhǔn)確。
[Abstract]:With the rapid advance of the Internet era, we are surrounded by huge amount of information, which plays an increasingly important role in life, especially in the field of electronic commerce. People buy and consume every day, producing a lot of product information and comment information. If we can get valuable content from the huge amount of text information, we can greatly enhance the shopping experience of consumers. This is not only in the academic field, but also in the commercial application of a research boom. Recommendation system by exploring user behavior data in the past, And the correlation between these behaviors and the properties of the product itself, so that the model can be built to predict the future behavior with the behavior that has occurred. It is to achieve the growth of business volume by recommending various kinds of products where users may have a point of interest. In the past, recommendation systems mainly focused on content-based recommendation methods. Comparing the attribute characteristics of other products with the products that the user has purchased or selected, if the degree of similarity is high, we recommend them. On this basis, we not only consider the description attribute of the product itself, In this paper, we first need to use web crawler to collect product information, and preprocess the collected comment text, such as word segmentation, etc, in order to improve the accuracy of recommendation. Because of the large number of feature words, the improved LDA topic model is used to extract features, combined with TF-IDF calculation, the features of different granularity are selected synthetically, and the topic information is mined. Finally, combining with user interest model and using sigmoid function, we improve the transition from attribute feature to comment feature in product similarity calculation in cold start environment. The Euclidean distance formula is used to calculate the similarity between different texts, and the products with high similarity are output and recommended as the recommended list. In this paper, the book information on Amazon Chinese website is used as experimental data for experimental analysis. In the course of the experiment, we also discuss the influence of the number of topics on the probability distribution of the text on the topic. In this paper, we evaluate the performance index of selecting different feature items and adopting different feature extraction methods, including accuracy, recall rate and F-Measure index. Compared with the traditional recommendation method, the method proposed in this paper is more accurate after considering the text information and improving it.
【學(xué)位授予單位】:南京財(cái)經(jīng)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2016
【分類號(hào)】:TP391.3;F713.36;F274
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 丁少衡;姬東鴻;王路路;;基于用戶屬性和評(píng)分的協(xié)同過濾推薦算法[J];計(jì)算機(jī)工程與設(shè)計(jì);2015年02期
2 楊莉;萬常選;雷剛;俞濤;孔保新;;基于特征詞權(quán)重的文本分類[J];計(jì)算機(jī)與現(xiàn)代化;2012年10期
3 朱郁筱;呂琳媛;;推薦系統(tǒng)評(píng)價(jià)指標(biāo)綜述[J];電子科技大學(xué)學(xué)報(bào);2012年02期
4 奉國(guó)和;鄭偉;;國(guó)內(nèi)中文自動(dòng)分詞技術(shù)研究綜述[J];圖書情報(bào)工作;2011年02期
5 姜偉;楊炳儒;;基于流形學(xué)習(xí)的維數(shù)約簡(jiǎn)算法[J];計(jì)算機(jī)工程;2010年12期
6 張啟宇;朱玲;張雅萍;;中文分詞算法研究綜述[J];情報(bào)探索;2008年11期
7 李淑英;;中文分詞技術(shù)[J];科技信息(科學(xué)教研);2007年36期
8 張光衛(wèi);李德毅;李鵬;康建初;陳桂生;;基于云模型的協(xié)同過濾推薦算法[J];軟件學(xué)報(bào);2007年10期
9 吳顏;沈潔;顧天竺;陳曉紅;李慧;張舒;;協(xié)同過濾推薦系統(tǒng)中數(shù)據(jù)稀疏問題的解決[J];計(jì)算機(jī)應(yīng)用研究;2007年06期
10 陳耀東,王挺;基于有向圖的雙向匹配分詞算法及實(shí)現(xiàn)[J];計(jì)算機(jī)應(yīng)用;2005年06期
相關(guān)碩士學(xué)位論文 前4條
1 馬寧;基于Mahout的推薦系統(tǒng)的研究與實(shí)現(xiàn)[D];蘭州大學(xué);2013年
2 于文浩;個(gè)性化影片推薦系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[D];天津大學(xué);2013年
3 裴英博;中文文本分類中特征選擇方法的研究與實(shí)現(xiàn)[D];西北大學(xué);2010年
4 陳慧芳;文本分類中特征向量空間降維方法研究[D];東南大學(xué);2005年
,本文編號(hào):1627329
本文鏈接:http://sikaile.net/jingjilunwen/dianzishangwulunwen/1627329.html