面向稀疏性數(shù)據(jù)的協(xié)同過(guò)濾推薦算法的研究
本文選題:推薦系統(tǒng) 切入點(diǎn):數(shù)據(jù)稀疏性 出處:《吉林大學(xué)》2017年碩士論文 論文類型:學(xué)位論文
【摘要】:隨著互聯(lián)網(wǎng)和電子商務(wù)的迅速發(fā)展,網(wǎng)絡(luò)上的信息迅速膨脹,出現(xiàn)了“信息過(guò)載”現(xiàn)象。個(gè)性化推薦技術(shù)能夠幫助用戶快速、準(zhǔn)確地從雜亂無(wú)章的信息找到用戶所需的信息,一定程度上緩解了“信息過(guò)載”問(wèn)題。作為當(dāng)前應(yīng)用最廣泛的個(gè)性化推薦技術(shù)之一,協(xié)同過(guò)濾技術(shù)在現(xiàn)實(shí)應(yīng)用中已經(jīng)獲得了相當(dāng)大的成功,但是由于現(xiàn)實(shí)的數(shù)據(jù)往往都十分稀疏,導(dǎo)致了協(xié)同過(guò)濾技術(shù)出現(xiàn)數(shù)據(jù)稀疏性問(wèn)題。冷啟動(dòng)問(wèn)題可以看作是數(shù)據(jù)稀疏性問(wèn)題的極端情況,本文將其視為數(shù)據(jù)稀疏性問(wèn)題研究。數(shù)據(jù)稀疏性問(wèn)題嚴(yán)重影響了協(xié)同過(guò)濾推薦算法的推薦質(zhì)量。引起數(shù)據(jù)稀疏性問(wèn)題是由于推薦系統(tǒng)中的用戶數(shù)量和項(xiàng)目數(shù)量越來(lái)越多,用戶對(duì)項(xiàng)目的評(píng)分?jǐn)?shù)量又很少,這樣用戶評(píng)分矩陣必然很稀疏,而協(xié)同過(guò)濾算法又非常依賴用戶評(píng)分矩陣。為了解決數(shù)據(jù)稀疏性問(wèn)題,研究人員針對(duì)用戶評(píng)分矩陣提出了許多方法,主要分兩大類:第一類對(duì)評(píng)分矩陣進(jìn)行填充,降低其稀疏程度;第二類是對(duì)評(píng)分矩陣進(jìn)行分解,刪除對(duì)計(jì)算相似度影響不大的用戶和項(xiàng)目,降低評(píng)分矩陣維度。在第二類方法中,選擇刪除的信息很可能會(huì)含有用戶的有用信息,影響推薦質(zhì)量,所以本文選擇在第一類方法的基礎(chǔ)上解決推薦系統(tǒng)里的數(shù)據(jù)稀疏性問(wèn)題。具體工作如下:1)針對(duì)用戶冷啟動(dòng)問(wèn)題,提出了融合用戶特征和項(xiàng)目關(guān)系的協(xié)同過(guò)濾算法(User-Item-Mix CF)。傳統(tǒng)的協(xié)同過(guò)濾算法在計(jì)算用戶間相似性時(shí),沒(méi)有考慮項(xiàng)目之間的關(guān)系,這樣會(huì)導(dǎo)致計(jì)算出的用戶相似性不準(zhǔn)確;谠搯(wèn)題本文提出一種融合項(xiàng)目關(guān)系的用戶間相似性計(jì)算方法(Item-Based User Sim),旨在提高用戶間相似性計(jì)算的準(zhǔn)確性;其后,在改進(jìn)的用戶間相似性算法的基礎(chǔ)上,在計(jì)算用戶相似性時(shí),加入了用戶特征屬性,并通過(guò)動(dòng)態(tài)平衡權(quán)值?將其與項(xiàng)目之間的關(guān)系融合,提出User-Item-Mix CF算法。最后,在Movie Lens數(shù)據(jù)集上,將User-Item-Mix CF算法與眾數(shù)法進(jìn)行對(duì)比實(shí)驗(yàn),實(shí)驗(yàn)結(jié)果表明:在選取不同的新用戶個(gè)數(shù)時(shí),User-Item-Mix CF算法的平均絕對(duì)誤差(MAE)值均小于眾數(shù)法。2)針對(duì)數(shù)據(jù)稀疏性問(wèn)題,提出了基于用戶評(píng)分預(yù)測(cè)的協(xié)同過(guò)濾算法(User-SP CF)。該算法在計(jì)算項(xiàng)目之間相似性時(shí),利用Item-Based User Sim算法計(jì)算用戶間的相似性,并將計(jì)算得到的用戶間相似性值填充到評(píng)分矩陣中未評(píng)分的項(xiàng),降低矩陣稀疏性;在填充得到的評(píng)分矩陣中,尋找目標(biāo)項(xiàng)目的最近鄰居集,完成推薦。最后在Movie Lens數(shù)據(jù)集上,將User-SP CF算法同基于項(xiàng)目評(píng)分預(yù)測(cè)的協(xié)同過(guò)濾算法和基于項(xiàng)目的協(xié)同過(guò)濾算法進(jìn)行對(duì)比實(shí)驗(yàn),實(shí)驗(yàn)結(jié)果表明:在選取不同鄰居個(gè)數(shù)時(shí),User-SP CF算法的平均絕對(duì)誤差(MAE)值均小于另外兩種算法。
[Abstract]:With the rapid development of the Internet and electronic commerce, the information on the network expands rapidly, and the phenomenon of "information overload" appears. Personalized recommendation technology can help users find the information they need quickly and accurately from the random information. To some extent, it alleviates the problem of "information overload". As one of the most widely used personalized recommendation technologies, collaborative filtering technology has achieved considerable success in practical applications. However, due to the fact that the data are often very sparse, the problem of data sparsity in collaborative filtering technology is caused. The cold start problem can be regarded as the extreme case of data sparsity problem. In this paper, the problem of data sparsity is considered as a study of data sparsity, which seriously affects the recommendation quality of collaborative filtering recommendation algorithm. The problem of data sparsity is caused by the increasing number of users and items in the recommendation system. In order to solve the problem of data sparsity, the user rating matrix is very sparse, and the collaborative filtering algorithm relies heavily on the user score matrix to solve the problem of data sparsity. Researchers have proposed a number of methods for user rating matrices, which are divided into two main categories: the first is to fill the scoring matrix to reduce its sparsity, and the second is to decompose the scoring matrix. Delete users and items that have little effect on computing similarity, and reduce the score matrix dimension. In the second method, the information selected to delete is likely to contain useful information of users and affect the quality of recommendation. So this paper chooses to solve the problem of data sparsity in recommendation system based on the first method. In this paper, a collaborative filtering algorithm combining user features and item relationships is proposed. The traditional collaborative filtering algorithm does not consider the relationship between items when calculating the similarity between users. This will lead to inaccurate user similarity calculation. Based on this problem, this paper proposes an Item-Based User simulation method to improve the accuracy of user similarity calculation. On the basis of the improved similarity algorithm between users, the user characteristic attribute is added in the calculation of user similarity, and the dynamic balance weight is adopted. The relationship between User-Item-Mix CF and the project is fused, and the User-Item-Mix CF algorithm is proposed. Finally, on the Movie Lens data set, the User-Item-Mix CF algorithm is compared with the mode method. The experimental results show that the average absolute error (mae) of User-Item-Mix CF algorithm is smaller than that of mode method. In this paper, a collaborative filtering algorithm based on user score prediction is proposed, which uses Item-Based User Sim algorithm to calculate the similarity between users when calculating the similarity between items. The calculated similarity value between users is filled into the ungraded items in the score matrix to reduce the sparsity of the matrix. In the filled score matrix, the nearest neighbor set of the target item is found and the recommendation is completed. Finally, on the Movie Lens data set, The User-SP CF algorithm is compared with the co-filtering algorithm based on item score prediction and the co-filtering algorithm based on item. The experimental results show that the mean absolute error (mae) of the User-SP CF algorithm is lower than that of the other two algorithms when the number of neighbors is selected.
【學(xué)位授予單位】:吉林大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP391.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前7條
1 鄧愛(ài)林,朱揚(yáng)勇,施伯樂(lè);基于項(xiàng)目評(píng)分預(yù)測(cè)的協(xié)同過(guò)濾推薦算法[J];軟件學(xué)報(bào);2003年09期
2 張光衛(wèi);李德毅;李鵬;康建初;陳桂生;;基于云模型的協(xié)同過(guò)濾推薦算法[J];軟件學(xué)報(bào);2007年10期
3 許海玲;吳瀟;李曉東;閻保平;;互聯(lián)網(wǎng)推薦系統(tǒng)比較研究[J];軟件學(xué)報(bào);2009年02期
4 馬宏偉;張光衛(wèi);李鵬;;協(xié)同過(guò)濾推薦算法綜述[J];小型微型計(jì)算機(jī)系統(tǒng);2009年07期
5 嵇曉聲;劉宴兵;羅來(lái)明;;協(xié)同過(guò)濾中基于用戶興趣度的相似性度量方法[J];計(jì)算機(jī)應(yīng)用;2010年10期
6 張玉芳;代金龍;熊忠陽(yáng);;分步填充緩解數(shù)據(jù)稀疏性的協(xié)同過(guò)濾算法[J];計(jì)算機(jī)應(yīng)用研究;2013年09期
7 孟祥武;劉樹(shù)棟;張玉潔;胡勛;;社會(huì)化推薦系統(tǒng)研究[J];軟件學(xué)報(bào);2015年06期
相關(guān)博士學(xué)位論文 前1條
1 孫小華;協(xié)同過(guò)濾系統(tǒng)的稀疏性與冷啟動(dòng)問(wèn)題研究[D];浙江大學(xué);2005年
,本文編號(hào):1632252
本文鏈接:http://sikaile.net/jingjilunwen/dianzishangwulunwen/1632252.html