基于Spark的推薦系統(tǒng)的研究
發(fā)布時(shí)間:2018-04-20 15:10
本文選題:推薦系統(tǒng) + 協(xié)同過(guò)濾算法 ; 參考:《浙江理工大學(xué)》2017年碩士論文
【摘要】:隨著互聯(lián)網(wǎng)和信息技術(shù)的高速發(fā)展,有海量的信息數(shù)據(jù)產(chǎn)生,怎么能夠從紛繁復(fù)雜的信息中,獲取有價(jià)值的數(shù)據(jù)是一個(gè)亟待解決的問(wèn)題。推薦系統(tǒng)是解決這一問(wèn)題的有效方法之一,推薦系統(tǒng)是一種從用戶的歷史行為以及喜好信息中給目標(biāo)用戶推薦產(chǎn)品的應(yīng)用,廣泛地應(yīng)用于電子商務(wù)、視頻音樂(lè)門(mén)戶網(wǎng)站等多個(gè)鄰域。然而依然存在數(shù)據(jù)稀疏性、冷啟動(dòng)、系統(tǒng)預(yù)測(cè)準(zhǔn)確率不理想的問(wèn)題。特別是隨著用戶數(shù)以及物品數(shù)不斷增加,基于單機(jī)的傳統(tǒng)推薦算法遇到不可擴(kuò)展性的瓶頸,很難滿足當(dāng)今的商業(yè)需求,而結(jié)合分布式計(jì)算平臺(tái)的并行化實(shí)現(xiàn)為解決這個(gè)問(wèn)題提供了新的思路。Spark是一種新型的基于內(nèi)存的通用并行化大數(shù)據(jù)計(jì)算引擎,由于其迭代并行化的計(jì)算優(yōu)勢(shì),在大數(shù)據(jù)處理方面得到廣泛的關(guān)注,本文主要研究了基于鄰域和基于模型的推薦算法,針對(duì)其稀疏性、冷啟動(dòng)及預(yù)測(cè)準(zhǔn)確率不理想的問(wèn)題,進(jìn)行算法改進(jìn),并將其在Spark集群上并行化設(shè)計(jì)與實(shí)現(xiàn)優(yōu)化算法。具體的研究的方面如下:(1)針對(duì)基于用戶的協(xié)同過(guò)濾算法存在的評(píng)分?jǐn)?shù)據(jù)稀疏情況下推薦預(yù)測(cè)準(zhǔn)確率不理想的問(wèn)題,引入了用戶屬性特征相似度。本文在計(jì)算用戶相似度時(shí),組合了用戶屬性特征相似度和用戶協(xié)同過(guò)濾相似度,以此來(lái)緩解評(píng)分?jǐn)?shù)據(jù)稀疏性對(duì)計(jì)算用戶相似度的影響。并在Spark平臺(tái)實(shí)現(xiàn)了優(yōu)化后的算法,通過(guò)實(shí)驗(yàn)結(jié)果分析,優(yōu)化的基于用戶的協(xié)同過(guò)濾算法,提高了推薦預(yù)測(cè)準(zhǔn)確率,也改善了算法的執(zhí)行效率。(2)針對(duì)基于物品的協(xié)同過(guò)濾算法存在冷啟動(dòng)情況下預(yù)測(cè)準(zhǔn)確率不理想的問(wèn)題,引入了物品屬性特征相似度。本文在計(jì)算物品相似度度時(shí),組合了物品屬性特征相似度和評(píng)分?jǐn)?shù)據(jù)相似度,以此來(lái)降低冷啟動(dòng)問(wèn)題對(duì)物品相似度計(jì)算的負(fù)面影響。并在Spark平臺(tái)并行化設(shè)計(jì)和實(shí)現(xiàn)了優(yōu)化的算法,通過(guò)實(shí)驗(yàn)結(jié)果分析,優(yōu)化的基于物品的協(xié)同過(guò)濾算法提高了系統(tǒng)預(yù)測(cè)準(zhǔn)確率。(3)針對(duì)基于ALS模型的推薦算法,本文設(shè)計(jì)了一種新的目標(biāo)函數(shù),融合了模型訓(xùn)練前的用戶及物品相似性信息。并在Spark平臺(tái)并行化設(shè)計(jì)和實(shí)現(xiàn)了基于ALS模型的推薦算法,同過(guò)實(shí)驗(yàn)結(jié)果分析,新的模型目標(biāo)函數(shù)下,有較好的預(yù)測(cè)準(zhǔn)確率,也提高了算法的執(zhí)行效率。
[Abstract]:With the rapid development of Internet and information technology, there is a huge amount of information data. How to obtain valuable data from the complicated information is an urgent problem to be solved. Recommendation system is one of the effective methods to solve this problem. Recommendation system is a kind of application of recommending products to target users from user's historical behavior and preference information, which is widely used in electronic commerce. Video music portal and other neighborhoods. However, there are still some problems, such as data sparsity, cold start, and system prediction accuracy. Especially, with the increasing number of users and items, the traditional recommendation algorithm based on single machine meets the bottleneck of inextensibility, so it is difficult to meet the needs of today's business. The parallelization of distributed computing platform provides a new way to solve this problem. Park .Sch is a new memory based general-purpose parallel big data computing engine, because of its advantage of iterative parallelization. In this paper, we mainly study the recommendation algorithm based on neighborhood and model, aiming at the problems of sparse, cold start and poor prediction accuracy, we improve the algorithm. The optimization algorithm is designed and implemented in parallel on Spark cluster. The specific aspects of the research are as follows: (1) aiming at the problem that the recommendation prediction accuracy is not ideal in the case of sparse scoring data in the user-based collaborative filtering algorithm, the similarity of user attribute features is introduced. In this paper, we combine user attribute feature similarity and user collaborative filtering similarity to mitigate the influence of score data sparsity on the calculation of user similarity. The optimized algorithm is implemented on the Spark platform. Through the analysis of experimental results, the optimized collaborative filtering algorithm based on users can improve the accuracy of recommendation prediction. It also improves the execution efficiency of the algorithm. (2) aiming at the problem that the prediction accuracy is not ideal in the cold start case, the article attribute feature similarity is introduced in the article based collaborative filtering algorithm. In order to reduce the negative effect of cold start problem on the calculation of item similarity, this paper combines the similarity of attribute features of items and the similarity of scoring data to calculate the similarity of items. The optimization algorithm is designed and implemented in parallel on Spark platform. Through the analysis of experimental results, the optimized object-based collaborative filtering algorithm improves the prediction accuracy of the system. In this paper, a new objective function is designed, which combines user and object similarity information before model training. The algorithm based on ALS model is designed and implemented in parallel on Spark platform. With the analysis of experimental results, the prediction accuracy is better and the efficiency of the algorithm is improved under the new model objective function.
【學(xué)位授予單位】:浙江理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類(lèi)號(hào)】:TP391.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前6條
1 魯權(quán);王如龍;張錦;丁怡;;融合鄰域模型與隱語(yǔ)義模型的推薦算法[J];計(jì)算機(jī)工程與應(yīng)用;2013年19期
2 孫金剛;艾麗蓉;;基于項(xiàng)目屬性和云填充的協(xié)同過(guò)濾推薦算法[J];計(jì)算機(jī)應(yīng)用;2012年03期
3 汪玉凱;;“十二五”規(guī)劃與我國(guó)電子政務(wù)發(fā)展趨勢(shì)[J];信息化建設(shè);2011年01期
4 汪靜;印鑒;;一種優(yōu)化的Item-based協(xié)同過(guò)濾推薦算法[J];小型微型計(jì)算機(jī)系統(tǒng);2010年12期
5 黃創(chuàng)光;印鑒;汪靜;劉玉葆;王甲海;;不確定近鄰的協(xié)同過(guò)濾推薦算法[J];計(jì)算機(jī)學(xué)報(bào);2010年08期
6 邢春曉;高鳳榮;戰(zhàn)思南;周立柱;;適應(yīng)用戶興趣變化的協(xié)同過(guò)濾推薦算法[J];計(jì)算機(jī)研究與發(fā)展;2007年02期
,本文編號(hào):1778289
本文鏈接:http://sikaile.net/shoufeilunwen/xixikjs/1778289.html
最近更新
教材專(zhuān)著