當(dāng)前位置：主頁(yè) > 經(jīng)濟(jì)論文 > 電子商務(wù)論文 >

基于MapReduce框架的混合推薦算法

發(fā)布時(shí)間：2018-05-30 03:22

本文選題：協(xié)同過(guò)濾 + 混合推薦系統(tǒng)　；參考：《長(zhǎng)春工業(yè)大學(xué)》2017年碩士論文

【摘要】：互聯(lián)網(wǎng)信息的爆炸式增長(zhǎng)、信息的種類變得紛繁復(fù)雜以及新興電子商務(wù)服務(wù)的出現(xiàn)使得信息過(guò)載的情況變得越來(lái)越嚴(yán)重。因而在信息過(guò)濾工具中,推薦系統(tǒng)的地位也變得越來(lái)越重要。在實(shí)際使用的系統(tǒng)中,使用最多的個(gè)性化推薦方法就是協(xié)同過(guò)濾算法。但隨著推薦系統(tǒng)規(guī)模的不斷擴(kuò)大,傳統(tǒng)的推薦算法大多都會(huì)遇到嚴(yán)重的計(jì)算瓶頸,且大量的數(shù)據(jù)并未顯著提高推薦算法的精度。因此,為了應(yīng)對(duì)不斷增長(zhǎng)的數(shù)據(jù)規(guī)模,對(duì)協(xié)同過(guò)濾推薦算法的并行化改造是十分必要的。本文對(duì)基于MapReduce并行計(jì)算框架的協(xié)同過(guò)濾推薦算法的設(shè)計(jì)及實(shí)現(xiàn)進(jìn)行了研究。首先使用MapReduce框架對(duì)算法進(jìn)行并行化,之后針對(duì)不同算法進(jìn)行優(yōu)化。對(duì)于基于物品的協(xié)同過(guò)濾算法,使用共現(xiàn)矩陣替換相似度矩陣,降低計(jì)算相似度矩陣所消耗的時(shí)間;在計(jì)算推薦結(jié)果的時(shí)候,使用Top-N的方法選擇最近鄰進(jìn)行計(jì)算,降低算法的計(jì)算量。對(duì)于基于用戶的協(xié)同過(guò)濾算法,將數(shù)據(jù)使用聚類的方法進(jìn)行分組。對(duì)每個(gè)分組的數(shù)據(jù),將同一分組的用戶作為最近鄰,計(jì)算組內(nèi)推薦值;使用所有的中心用戶作為近鄰,計(jì)算出組間推薦值。將這三個(gè)推薦結(jié)果作為訓(xùn)練數(shù)據(jù),實(shí)際評(píng)分作為輸出結(jié)果,使用線性回歸的方法進(jìn)行建模。針對(duì)這個(gè)模型,定義損失函數(shù)后,使用梯度下降的方法求出最優(yōu)的混合比例。具體來(lái)說(shuō),通過(guò)將數(shù)據(jù)進(jìn)行十折交叉,劃分出多個(gè)數(shù)據(jù)分組,通過(guò)不同的Top-N值及數(shù)據(jù)分組,可以訓(xùn)練出不同的混合參數(shù),再使用這個(gè)參數(shù)對(duì)所有的數(shù)據(jù)分組計(jì)算出MAE值和RMSE值的均值。通過(guò)比較計(jì)算出的均值,選擇最優(yōu)的混合系數(shù)和Top-N值。在實(shí)驗(yàn)中通過(guò)對(duì)前述兩個(gè)算法所產(chǎn)生的三份推薦結(jié)果進(jìn)行混合來(lái)產(chǎn)生最終的推薦結(jié)果,并對(duì)推薦結(jié)果的精度進(jìn)行了驗(yàn)證。同時(shí)針對(duì)程序的運(yùn)行時(shí)間,評(píng)估了改進(jìn)后的算法的性能。實(shí)驗(yàn)結(jié)果表明,修改后的協(xié)同過(guò)濾算法,不僅提高了協(xié)同過(guò)濾算法對(duì)大規(guī)模數(shù)據(jù)的處理能力,同時(shí)通過(guò)對(duì)不同結(jié)果的混合,提高了算法的精度。與基于物品的協(xié)同過(guò)濾算法相比,算法的準(zhǔn)確率有明顯提升,且程序運(yùn)行時(shí)間有明顯的下降;與基于用戶的協(xié)同過(guò)濾算法相比,算法的準(zhǔn)確率提升明顯,而通過(guò)分組的方式也降低了算法在計(jì)算相似度矩陣和計(jì)算結(jié)果所消耗的時(shí)間,效率有明顯提升。
[Abstract]:With the explosive growth of Internet information, the variety of information becomes complicated and the emergence of new e-commerce services makes the situation of information overload more and more serious. Therefore, the status of recommendation system has become more and more important in information filtering tools. In the practical system, collaborative filtering algorithm is the most popular personalized recommendation method. However, with the continuous expansion of the scale of recommendation system, most of the traditional recommendation algorithms will encounter serious computational bottlenecks, and a large number of data have not significantly improved the accuracy of the recommendation algorithm. Therefore, in order to cope with the growing data scale, the parallel transformation of collaborative filtering recommendation algorithm is very necessary. This paper studies the design and implementation of collaborative filtering recommendation algorithm based on MapReduce parallel computing framework. Firstly, the algorithm is parallelized by MapReduce framework, and then optimized for different algorithms. For the collaborative filtering algorithm based on articles, the co-occurrence matrix is used to replace the similarity matrix to reduce the time consumed in calculating the similarity matrix. When calculating the recommended results, Top-N is used to select the nearest neighbor for calculation. Reduce the computational complexity of the algorithm. For the user-based collaborative filtering algorithm, the data is grouped by clustering method. For the data of each packet, the user of the same packet is taken as the nearest neighbor to calculate the recommended value in the group, and all the central users are used as the nearest neighbor to calculate the recommended value between the groups. The three recommended results are taken as training data and the actual score is taken as the output result. The linear regression method is used to model the model. For this model, the optimal mixing ratio is obtained by gradient descent after the loss function is defined. Specifically, the data can be divided into several data groups by ten fold crossing, and different mixed parameters can be trained by different Top-N values and data grouping. Then we use this parameter to calculate the mean values of MAE and RMSE for all the data groups. By comparing the calculated mean value, the optimal mixing coefficient and Top-N value are selected. In the experiment, the three recommended results are mixed to produce the final recommendation results, and the accuracy of the recommended results is verified. At the same time, the performance of the improved algorithm is evaluated according to the running time of the program. Experimental results show that the modified collaborative filtering algorithm not only improves the ability of collaborative filtering algorithm to deal with large-scale data, but also improves the accuracy of the algorithm by mixing different results. Compared with the collaborative filtering algorithm based on articles, the accuracy of the algorithm is obviously improved, and the running time of the program is obviously decreased; compared with the collaborative filtering algorithm based on users, the accuracy of the algorithm is obviously improved. By grouping, the efficiency of the algorithm is greatly improved by reducing the time consumed in computing the similarity matrix and the results.
【學(xué)位授予單位】：長(zhǎng)春工業(yè)大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2017
【分類號(hào)】：TP391.3

【相似文獻(xiàn)】

相關(guān)期刊論文前10條

1 徐義峰;徐云青;劉曉平;;一種基于時(shí)間序列性的推薦算法[J];計(jì)算機(jī)系統(tǒng)應(yīng)用;2006年10期

2 余小鵬;;一種基于多層關(guān)聯(lián)規(guī)則的推薦算法研究[J];計(jì)算機(jī)應(yīng)用;2007年06期

3 張海玉;劉志都;楊彩;賈松浩;;基于頁(yè)面聚類的推薦算法的改進(jìn)[J];計(jì)算機(jī)應(yīng)用與軟件;2008年09期

4 張立燕;;一種基于用戶事務(wù)模式的推薦算法[J];福建電腦;2009年03期

5 王晗;夏自謙;;基于蟻群算法和瀏覽路徑的推薦算法研究[J];中國(guó)科技信息;2009年07期

6 周珊丹;周興社;王海鵬;倪紅波;張桂英;苗強(qiáng);;智能博物館環(huán)境下的個(gè)性化推薦算法[J];計(jì)算機(jī)工程與應(yīng)用;2010年19期

7 王文;;個(gè)性化推薦算法研究[J];電腦知識(shí)與技術(shù);2010年16期

8 張愷;秦亮曦;寧朝波;李文閣;;改進(jìn)評(píng)價(jià)估計(jì)的混合推薦算法研究[J];微計(jì)算機(jī)信息;2010年36期

9 夏秀峰;代沁;叢麗暉;;用戶顯意識(shí)下的多重態(tài)度個(gè)性化推薦算法[J];計(jì)算機(jī)工程與應(yīng)用;2011年16期

10 楊博;趙鵬飛;;推薦算法綜述[J];山西大學(xué)學(xué)報(bào)(自然科學(xué)版);2011年03期

相關(guān)會(huì)議論文前10條

1 王韜丞;羅喜軍;杜小勇;;基于層次的推薦:一種新的個(gè)性化推薦算法[A];第二十四屆中國(guó)數(shù)據(jù)庫(kù)學(xué)術(shù)會(huì)議論文集（技術(shù)報(bào)告篇）[C];2007年

2 唐燦;;基于模糊用戶心理模式的個(gè)性化推薦算法[A];2008年計(jì)算機(jī)應(yīng)用技術(shù)交流會(huì)論文集[C];2008年

3 秦國(guó);杜小勇;;基于用戶層次信息的協(xié)同推薦算法[A];第二十一屆中國(guó)數(shù)據(jù)庫(kù)學(xué)術(shù)會(huì)議論文集（技術(shù)報(bào)告篇）[C];2004年

4 周玉妮;鄭會(huì)頌;;基于瀏覽路徑選擇的蟻群推薦算法:用于移動(dòng)商務(wù)個(gè)性化推薦系統(tǒng)[A];社會(huì)經(jīng)濟(jì)發(fā)展轉(zhuǎn)型與系統(tǒng)工程——中國(guó)系統(tǒng)工程學(xué)會(huì)第17屆學(xué)術(shù)年會(huì)論文集[C];2012年

5 蘇日啟;胡皓;汪秉宏;;基于網(wǎng)絡(luò)的含時(shí)推薦算法[A];第五屆全國(guó)復(fù)雜網(wǎng)絡(luò)學(xué)術(shù)會(huì)議論文（摘要）匯集[C];2009年

6 梁莘q，

本文編號(hào)：1953702

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/jingjilunwen/dianzishangwulunwen/1953702.html

上一篇：促進(jìn)柳州市電子商務(wù)發(fā)展的稅收政策分析
下一篇：中信銀行信用卡中心移動(dòng)支付業(yè)務(wù)拓展策略研究

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于MapReduce框架的混合推薦算法