基于Spark的推薦系統(tǒng)的研究
發(fā)布時間:2018-04-20 15:10
本文選題:推薦系統(tǒng) + 協(xié)同過濾算法��; 參考:《浙江理工大學(xué)》2017年碩士論文
【摘要】:隨著互聯(lián)網(wǎng)和信息技術(shù)的高速發(fā)展,有海量的信息數(shù)據(jù)產(chǎn)生,怎么能夠從紛繁復(fù)雜的信息中,獲取有價值的數(shù)據(jù)是一個亟待解決的問題。推薦系統(tǒng)是解決這一問題的有效方法之一,推薦系統(tǒng)是一種從用戶的歷史行為以及喜好信息中給目標(biāo)用戶推薦產(chǎn)品的應(yīng)用,廣泛地應(yīng)用于電子商務(wù)、視頻音樂門戶網(wǎng)站等多個鄰域。然而依然存在數(shù)據(jù)稀疏性、冷啟動、系統(tǒng)預(yù)測準(zhǔn)確率不理想的問題。特別是隨著用戶數(shù)以及物品數(shù)不斷增加,基于單機(jī)的傳統(tǒng)推薦算法遇到不可擴(kuò)展性的瓶頸,很難滿足當(dāng)今的商業(yè)需求,而結(jié)合分布式計算平臺的并行化實現(xiàn)為解決這個問題提供了新的思路。Spark是一種新型的基于內(nèi)存的通用并行化大數(shù)據(jù)計算引擎,由于其迭代并行化的計算優(yōu)勢,在大數(shù)據(jù)處理方面得到廣泛的關(guān)注,本文主要研究了基于鄰域和基于模型的推薦算法,針對其稀疏性、冷啟動及預(yù)測準(zhǔn)確率不理想的問題,進(jìn)行算法改進(jìn),并將其在Spark集群上并行化設(shè)計與實現(xiàn)優(yōu)化算法。具體的研究的方面如下:(1)針對基于用戶的協(xié)同過濾算法存在的評分?jǐn)?shù)據(jù)稀疏情況下推薦預(yù)測準(zhǔn)確率不理想的問題,引入了用戶屬性特征相似度。本文在計算用戶相似度時,組合了用戶屬性特征相似度和用戶協(xié)同過濾相似度,以此來緩解評分?jǐn)?shù)據(jù)稀疏性對計算用戶相似度的影響。并在Spark平臺實現(xiàn)了優(yōu)化后的算法,通過實驗結(jié)果分析,優(yōu)化的基于用戶的協(xié)同過濾算法,提高了推薦預(yù)測準(zhǔn)確率,也改善了算法的執(zhí)行效率。(2)針對基于物品的協(xié)同過濾算法存在冷啟動情況下預(yù)測準(zhǔn)確率不理想的問題,引入了物品屬性特征相似度。本文在計算物品相似度度時,組合了物品屬性特征相似度和評分?jǐn)?shù)據(jù)相似度,以此來降低冷啟動問題對物品相似度計算的負(fù)面影響。并在Spark平臺并行化設(shè)計和實現(xiàn)了優(yōu)化的算法,通過實驗結(jié)果分析,優(yōu)化的基于物品的協(xié)同過濾算法提高了系統(tǒng)預(yù)測準(zhǔn)確率。(3)針對基于ALS模型的推薦算法,本文設(shè)計了一種新的目標(biāo)函數(shù),融合了模型訓(xùn)練前的用戶及物品相似性信息。并在Spark平臺并行化設(shè)計和實現(xiàn)了基于ALS模型的推薦算法,同過實驗結(jié)果分析,新的模型目標(biāo)函數(shù)下,有較好的預(yù)測準(zhǔn)確率,也提高了算法的執(zhí)行效率。
[Abstract]:With the rapid development of Internet and information technology, there is a huge amount of information data. How to obtain valuable data from the complicated information is an urgent problem to be solved. Recommendation system is one of the effective methods to solve this problem. Recommendation system is a kind of application of recommending products to target users from user's historical behavior and preference information, which is widely used in electronic commerce. Video music portal and other neighborhoods. However, there are still some problems, such as data sparsity, cold start, and system prediction accuracy. Especially, with the increasing number of users and items, the traditional recommendation algorithm based on single machine meets the bottleneck of inextensibility, so it is difficult to meet the needs of today's business. The parallelization of distributed computing platform provides a new way to solve this problem. Park .Sch is a new memory based general-purpose parallel big data computing engine, because of its advantage of iterative parallelization. In this paper, we mainly study the recommendation algorithm based on neighborhood and model, aiming at the problems of sparse, cold start and poor prediction accuracy, we improve the algorithm. The optimization algorithm is designed and implemented in parallel on Spark cluster. The specific aspects of the research are as follows: (1) aiming at the problem that the recommendation prediction accuracy is not ideal in the case of sparse scoring data in the user-based collaborative filtering algorithm, the similarity of user attribute features is introduced. In this paper, we combine user attribute feature similarity and user collaborative filtering similarity to mitigate the influence of score data sparsity on the calculation of user similarity. The optimized algorithm is implemented on the Spark platform. Through the analysis of experimental results, the optimized collaborative filtering algorithm based on users can improve the accuracy of recommendation prediction. It also improves the execution efficiency of the algorithm. (2) aiming at the problem that the prediction accuracy is not ideal in the cold start case, the article attribute feature similarity is introduced in the article based collaborative filtering algorithm. In order to reduce the negative effect of cold start problem on the calculation of item similarity, this paper combines the similarity of attribute features of items and the similarity of scoring data to calculate the similarity of items. The optimization algorithm is designed and implemented in parallel on Spark platform. Through the analysis of experimental results, the optimized object-based collaborative filtering algorithm improves the prediction accuracy of the system. In this paper, a new objective function is designed, which combines user and object similarity information before model training. The algorithm based on ALS model is designed and implemented in parallel on Spark platform. With the analysis of experimental results, the prediction accuracy is better and the efficiency of the algorithm is improved under the new model objective function.
【學(xué)位授予單位】:浙江理工大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP391.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前6條
1 魯權(quán);王如龍;張錦;丁怡;;融合鄰域模型與隱語義模型的推薦算法[J];計算機(jī)工程與應(yīng)用;2013年19期
2 孫金剛;艾麗蓉;;基于項目屬性和云填充的協(xié)同過濾推薦算法[J];計算機(jī)應(yīng)用;2012年03期
3 汪玉凱;;“十二五”規(guī)劃與我國電子政務(wù)發(fā)展趨勢[J];信息化建設(shè);2011年01期
4 汪靜;印鑒;;一種優(yōu)化的Item-based協(xié)同過濾推薦算法[J];小型微型計算機(jī)系統(tǒng);2010年12期
5 黃創(chuàng)光;印鑒;汪靜;劉玉葆;王甲海;;不確定近鄰的協(xié)同過濾推薦算法[J];計算機(jī)學(xué)報;2010年08期
6 邢春曉;高鳳榮;戰(zhàn)思南;周立柱;;適應(yīng)用戶興趣變化的協(xié)同過濾推薦算法[J];計算機(jī)研究與發(fā)展;2007年02期
,本文編號:1778289
本文鏈接:http://sikaile.net/shoufeilunwen/xixikjs/1778289.html
最近更新
教材專著