統(tǒng)計(jì)視角下面向數(shù)據(jù)稀疏問題的協(xié)同過濾推薦算法研究
本文關(guān)鍵詞:統(tǒng)計(jì)視角下面向數(shù)據(jù)稀疏問題的協(xié)同過濾推薦算法研究 出處:《重慶工商大學(xué)》2016年碩士論文 論文類型:學(xué)位論文
更多相關(guān)文章: 統(tǒng)計(jì)學(xué) 協(xié)同過濾 推薦算法 數(shù)據(jù)稀疏性
【摘要】:隨著網(wǎng)絡(luò)的普及以及電子商務(wù)的飛速發(fā)展,信息資源呈爆發(fā)式增長,用戶在海量資源中快速而準(zhǔn)確得找到自己喜歡的信息或商品變得越來越困難。為了解決這個(gè)問題,便產(chǎn)生了推薦系統(tǒng)。推薦算法一直是推薦系統(tǒng)的核心技術(shù)。目前,協(xié)同過濾推薦算法是眾多推薦算法中應(yīng)用最成功且最廣泛的推薦技術(shù)。它主要根據(jù)用戶留在網(wǎng)上的評(píng)分進(jìn)行推薦。然而在實(shí)際應(yīng)用中,由于用戶數(shù)據(jù)和項(xiàng)目數(shù)據(jù)規(guī)模相當(dāng)龐大,且用戶對(duì)自己接觸過的項(xiàng)目評(píng)分?jǐn)?shù)量又非常有限,從而導(dǎo)致了嚴(yán)重的數(shù)據(jù)稀疏性問題,該問題是導(dǎo)致傳統(tǒng)的協(xié)同過濾推薦算法推薦精度較差的主要原因之一。本文試圖站在統(tǒng)計(jì)學(xué)的角度,針對(duì)數(shù)據(jù)稀疏性問題對(duì)協(xié)同過濾推薦算法進(jìn)行研究。實(shí)現(xiàn)了基于描述性統(tǒng)計(jì)的簡單推薦,并探究了將統(tǒng)計(jì)量填充、聚類分析、矩陣分解等方法應(yīng)用到協(xié)同推薦算法中的效果。在詳細(xì)分析了數(shù)據(jù)稀疏性問題的起因以及對(duì)協(xié)同推薦的影響途徑基礎(chǔ)上,本文提出了采用統(tǒng)計(jì)量填充的方法緩解數(shù)據(jù)稀疏性問題,進(jìn)而用K-Means聚類方法對(duì)用戶進(jìn)行聚類,根據(jù)輪廓系數(shù)確定用戶類別數(shù),對(duì)每類用戶的缺失評(píng)分使用同類別的評(píng)分統(tǒng)計(jì)量作為固定值進(jìn)行填充。除了固定值填充缺失評(píng)分外,本文還采用奇異值分解(SVD)降維技術(shù)實(shí)現(xiàn)評(píng)分預(yù)測,利用預(yù)測評(píng)分對(duì)原始矩陣進(jìn)行填充,形成新的用戶—項(xiàng)目評(píng)分矩陣,再進(jìn)行協(xié)同推薦。最后從推薦過程修正的角度出發(fā),對(duì)傳統(tǒng)的用戶間相似度計(jì)算采用加權(quán)的方式進(jìn)行改進(jìn),提出了基于用戶偏好相似度與用戶評(píng)分相似度進(jìn)行加權(quán)計(jì)算用戶間相似度的方法。采用MovieLens數(shù)據(jù)集對(duì)上述方法進(jìn)行實(shí)驗(yàn),通過平均絕對(duì)偏差(MAE)比較不同方法對(duì)推薦算法的改進(jìn)效果,算法過程主要采用EXCEL,R語言輔助編程實(shí)現(xiàn)。實(shí)驗(yàn)證明,本文提出的方法均能在一定程度上緩解數(shù)據(jù)稀疏問題,從而提高推薦質(zhì)量。統(tǒng)計(jì)量填充、聚類、相似度計(jì)算等都屬于統(tǒng)計(jì)學(xué)中的基礎(chǔ)方法,考慮將統(tǒng)計(jì)學(xué)方法應(yīng)用于推薦領(lǐng)域,不應(yīng)該只注重于繁雜的模型,將基礎(chǔ)的統(tǒng)計(jì)方法加入到推薦算法的研究中來,也能夠有效得解決推薦算法所面臨的問題。在未來發(fā)展中,統(tǒng)計(jì)學(xué)方法將會(huì)應(yīng)用于更多領(lǐng)域,獲得更長足的發(fā)展。
[Abstract]:With the popularity of the network and the rapid development of electronic commerce, information resources are explosive growth. In order to solve this problem, it is becoming more and more difficult for users to find their favorite information or goods quickly and accurately in a large amount of resources. Recommendation algorithm has always been the core technology of recommendation system. Collaborative filtering recommendation algorithm is the most successful and widely used recommendation technology among many recommendation algorithms. Due to the large scale of user data and project data, and the limited number of items that users have come into contact with, it leads to serious data sparsity problem. This problem is one of the main reasons for the poor recommendation accuracy of the traditional collaborative filtering recommendation algorithm. To solve the problem of data sparsity, the collaborative filtering recommendation algorithm is studied. The simple recommendation based on descriptive statistics is realized, and the statistic filling and clustering analysis are explored. Matrix decomposition and other methods are applied to collaborative recommendation algorithms. Based on the detailed analysis of the causes of the data sparsity problem and its influence on collaborative recommendation. In this paper, the statistical filling method is proposed to alleviate the problem of data sparsity, and then K-Means clustering method is used to cluster the users, and the number of user categories is determined according to the contour coefficient. In addition to the fixed value fill the missing score, this paper also uses the singular value decomposition (SVD) to reduce the dimension to achieve the score prediction. The original matrix is filled with the prediction score to form a new user-item scoring matrix, and then collaborative recommendation is carried out. Finally, from the point of view of the revision of the recommendation process. The traditional similarity calculation between users is improved by weighted method. This paper proposes a method of calculating user similarity based on user preference similarity and user score similarity, and makes experiments on the above methods by using MovieLens data set. Through the mean absolute deviation (mae) to compare the improvement effect of different methods on the recommended algorithm, the algorithm is mainly implemented by excel language assisted programming. The methods proposed in this paper can alleviate the problem of data sparsity to a certain extent, thus improving the quality of recommendations, statistical filling, clustering, similarity calculation and other basic methods in statistics. Considering the application of statistical methods in the field of recommendation, we should not only focus on the complicated models, but also add the basic statistical methods to the research of recommendation algorithms. In the future, the statistical method will be applied in more fields and will make great progress.
【學(xué)位授予單位】:重慶工商大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2016
【分類號(hào)】:TP391.3;F713.36
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 徐義峰;徐云青;劉曉平;;一種基于時(shí)間序列性的推薦算法[J];計(jì)算機(jī)系統(tǒng)應(yīng)用;2006年10期
2 余小鵬;;一種基于多層關(guān)聯(lián)規(guī)則的推薦算法研究[J];計(jì)算機(jī)應(yīng)用;2007年06期
3 張海玉;劉志都;楊彩;賈松浩;;基于頁面聚類的推薦算法的改進(jìn)[J];計(jì)算機(jī)應(yīng)用與軟件;2008年09期
4 張立燕;;一種基于用戶事務(wù)模式的推薦算法[J];福建電腦;2009年03期
5 王晗;夏自謙;;基于蟻群算法和瀏覽路徑的推薦算法研究[J];中國科技信息;2009年07期
6 周珊丹;周興社;王海鵬;倪紅波;張桂英;苗強(qiáng);;智能博物館環(huán)境下的個(gè)性化推薦算法[J];計(jì)算機(jī)工程與應(yīng)用;2010年19期
7 王文;;個(gè)性化推薦算法研究[J];電腦知識(shí)與技術(shù);2010年16期
8 張愷;秦亮曦;寧朝波;李文閣;;改進(jìn)評(píng)價(jià)估計(jì)的混合推薦算法研究[J];微計(jì)算機(jī)信息;2010年36期
9 夏秀峰;代沁;叢麗暉;;用戶顯意識(shí)下的多重態(tài)度個(gè)性化推薦算法[J];計(jì)算機(jī)工程與應(yīng)用;2011年16期
10 楊博;趙鵬飛;;推薦算法綜述[J];山西大學(xué)學(xué)報(bào)(自然科學(xué)版);2011年03期
相關(guān)會(huì)議論文 前10條
1 王韜丞;羅喜軍;杜小勇;;基于層次的推薦:一種新的個(gè)性化推薦算法[A];第二十四屆中國數(shù)據(jù)庫學(xué)術(shù)會(huì)議論文集(技術(shù)報(bào)告篇)[C];2007年
2 唐燦;;基于模糊用戶心理模式的個(gè)性化推薦算法[A];2008年計(jì)算機(jī)應(yīng)用技術(shù)交流會(huì)論文集[C];2008年
3 秦國;杜小勇;;基于用戶層次信息的協(xié)同推薦算法[A];第二十一屆中國數(shù)據(jù)庫學(xué)術(shù)會(huì)議論文集(技術(shù)報(bào)告篇)[C];2004年
4 周玉妮;鄭會(huì)頌;;基于瀏覽路徑選擇的蟻群推薦算法:用于移動(dòng)商務(wù)個(gè)性化推薦系統(tǒng)[A];社會(huì)經(jīng)濟(jì)發(fā)展轉(zhuǎn)型與系統(tǒng)工程——中國系統(tǒng)工程學(xué)會(huì)第17屆學(xué)術(shù)年會(huì)論文集[C];2012年
5 蘇日啟;胡皓;汪秉宏;;基于網(wǎng)絡(luò)的含時(shí)推薦算法[A];第五屆全國復(fù)雜網(wǎng)絡(luò)學(xué)術(shù)會(huì)議論文(摘要)匯集[C];2009年
6 梁莘q,
本文編號(hào):1438074
本文鏈接:http://sikaile.net/jingjilunwen/guojimaoyilunwen/1438074.html