基于Spark的混合推薦系統(tǒng)的研究與實(shí)現(xiàn)
發(fā)布時(shí)間:2018-06-27 05:07
本文選題:推薦系統(tǒng) + Spark平臺(tái) ; 參考:《北京交通大學(xué)》2017年碩士論文
【摘要】:在大數(shù)據(jù)時(shí)代背景下,推薦系統(tǒng)已經(jīng)成為一個(gè)解決信息過載問題不可或缺的工具。一方面用戶通過推薦系統(tǒng)在海量的數(shù)據(jù)信息中篩選有用信息,獲得有力的決策支持。另一方面提供推薦服務(wù)的電商、多媒體服務(wù)商等希望通過推薦系統(tǒng)來對(duì)用戶進(jìn)行針對(duì)性的個(gè)性化營(yíng)銷以提高收益。近十年來推薦系統(tǒng)取得了突飛猛進(jìn)的發(fā)展,但仍面臨著諸多挑戰(zhàn)和問題,例如海量數(shù)據(jù)的存儲(chǔ)計(jì)算和擴(kuò)展性問題,原生的數(shù)據(jù)稀疏性問題,以及缺乏推薦系統(tǒng)的時(shí)效性問題等等。為了解決上述問題,本文基于Spark平臺(tái)研究并實(shí)現(xiàn)了一個(gè)針對(duì)電影領(lǐng)域的混合推薦系統(tǒng)。第一,研究了目前常用的矩陣因子分解方法,提出了一種混合了時(shí)間因子和鄰域信息的混合矩陣分解推薦算法。將用戶所在群體興趣隨時(shí)間遷移的因素考慮其中,并采用了動(dòng)量梯度下降的方式求解損失函數(shù),在參數(shù)求解速度提升的同時(shí)提高了算法的預(yù)測(cè)精確性;第二,針對(duì)協(xié)同過濾的相似度計(jì)算問題,提出了一種改進(jìn)的皮爾遜系數(shù)相似度計(jì)算方法,考慮了物品的熱度和個(gè)體評(píng)分偏置的影響。經(jīng)實(shí)驗(yàn)證明,該計(jì)算方法有效的降低了算法的均方根誤差;第三,針對(duì)推薦系統(tǒng)的時(shí)效性問題,本文采用了增量ALS矩陣分解算法。對(duì)于新獲取的信息,局部的修改模型而避免對(duì)模型的重新訓(xùn)練,節(jié)省了巨大的計(jì)算花銷。實(shí)驗(yàn)證明,增量ALS較目前流行的增量SGD具有更快的交互速度和更高的準(zhǔn)確度,有效的提高了系統(tǒng)的反應(yīng)速度;最后本文基于Spark平臺(tái)設(shè)計(jì)并實(shí)現(xiàn)了一個(gè)電影推薦系統(tǒng),包括了日志收集、數(shù)據(jù)處理和混合推薦引擎等主要模塊,并融合了上述優(yōu)化方法,有效的改善了目前推薦系統(tǒng)遇到的主要問題。
[Abstract]:Under the background of big data, recommendation system has become an indispensable tool to solve the problem of information overload. On the one hand, users filter useful information through recommendation system to obtain powerful decision support. On the other hand, ecommerce providers and multimedia service providers who provide recommendation services hope to use recommendation system to carry out targeted personalized marketing to improve revenue. In the past decade, the recommendation system has made great progress, but it still faces many challenges and problems, such as the storage, computation and expansibility of massive data, the sparsity of native data. And the lack of recommendation system timeliness and so on. In order to solve the above problems, this paper studies and implements a hybrid recommendation system for film field based on Spark platform. Firstly, the matrix factorization methods are studied, and a hybrid matrix factorization recommendation algorithm is proposed, which combines the time factor and neighborhood information. Considering the factor that the user's group interests migrate with time, the loss function is solved by decreasing the momentum gradient, which improves the prediction accuracy of the algorithm while improving the speed of solving the parameters. An improved method for calculating the similarity of Pearson coefficient is proposed to solve the problem of similarity calculation of collaborative filtering. The effects of heat and individual bias are considered. Experimental results show that the algorithm can effectively reduce the root mean square error. Thirdly, the incremental ALS matrix decomposition algorithm is used to solve the time-efficiency problem of recommendation system. For the newly acquired information, the local modification of the model avoids the re-training of the model and saves huge computational costs. Experimental results show that the incremental ALS has faster interaction speed and higher accuracy than the popular incremental SGD. Finally, this paper designs and implements a movie recommendation system based on Spark platform. Including log collection, data processing and hybrid recommendation engine and other major modules, and the integration of the above optimization methods, effectively improve the main problems encountered in the current recommendation system.
【學(xué)位授予單位】:北京交通大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP391.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前2條
1 龔燦;盧軍;;基于Spark的實(shí)時(shí)情境推薦系統(tǒng)關(guān)鍵技術(shù)研究[J];電子測(cè)試;2016年Z1期
2 車晉強(qiáng);謝紅薇;;基于Spark的分層協(xié)同過濾推薦算法[J];電子技術(shù)應(yīng)用;2015年09期
,本文編號(hào):2072776
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2072776.html
最近更新
教材專著