天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于Mahout、Hadoop的推薦系統(tǒng)研究與實(shí)現(xiàn)

發(fā)布時(shí)間:2018-03-11 05:05

  本文選題:推薦系統(tǒng) 切入點(diǎn):協(xié)同過濾 出處:《長江大學(xué)》2016年碩士論文 論文類型:學(xué)位論文


【摘要】:隨著以電子商務(wù)為代表的互聯(lián)網(wǎng)近年來的飛速發(fā)展,數(shù)據(jù)量、信息量爆發(fā)式的增加,使得在龐大數(shù)量的商品中選擇出目標(biāo)用戶真正需要商品的難度增大。為了滿足這一需求,對在當(dāng)今社會之中扮演著越來越重要的角色的推薦系統(tǒng)進(jìn)行細(xì)致的研究便有著較大的現(xiàn)實(shí)意義。提高推薦系統(tǒng)推薦的準(zhǔn)確度,既能為使用其的企業(yè)獲取巨額經(jīng)濟(jì)效益,同時(shí)也為使用其的用戶提供更加人性化的便捷服務(wù)。協(xié)同過濾算法在推薦系統(tǒng)中有著眾多成功應(yīng)用,可是該類算法在稀疏數(shù)據(jù)場景下的表現(xiàn)并不盡如人意。本文從推薦算法的基本概念入手,討論若干種不同相似度計(jì)算方式的協(xié)同過濾算法,提出基于巴氏系數(shù)的相似度計(jì)算方式,通過MovieLens、Netflix和Yahoo Music開源數(shù)據(jù)進(jìn)行實(shí)驗(yàn)驗(yàn)證其有效性。推薦系統(tǒng)作為一個(gè)數(shù)據(jù)密集型的系統(tǒng),很容易出現(xiàn)數(shù)據(jù)爆炸式地增長,本文還針對海量數(shù)據(jù)情景,分析了Hadoop分布式計(jì)算平臺的計(jì)算原理,以及著名的機(jī)器學(xué)習(xí)框架Mahout中的推薦算法部分進(jìn)行了詳細(xì)的介紹,并介紹了其對所提出的基于巴氏系數(shù)的協(xié)同過濾算法的具體實(shí)現(xiàn)所帶來的便利,以及其能Hadoop結(jié)合使用的原理。最后本文進(jìn)行了系統(tǒng)原型的設(shè)計(jì)與實(shí)現(xiàn)。具體的介紹了所提出的基于巴氏系數(shù)的相似度的協(xié)同過濾算法在Mahout中的實(shí)現(xiàn)過程,并給出了源代碼,然后根據(jù)系統(tǒng)長時(shí)間運(yùn)行的必然需求,給出了將單機(jī)計(jì)算環(huán)境中的系統(tǒng)遷移至Hadoop分布式計(jì)算平臺的具體方案及步驟,用Mahout結(jié)合Hadoop的方式解決海量數(shù)據(jù)帶來的計(jì)算和儲存瓶頸?偨Y(jié)說來,本文的創(chuàng)新點(diǎn)主要體現(xiàn)在以下兩點(diǎn):1)針對協(xié)同過濾算法過于依賴共同評分?jǐn)?shù)據(jù)的缺陷,在稀疏數(shù)據(jù)場景下所做出的推薦結(jié)果并不準(zhǔn)確,為解決這一問題,本文提出了一種新的基于巴氏系數(shù)的相似度計(jì)算方式,用于協(xié)同過濾算法之中,并通過開源數(shù)據(jù)的實(shí)驗(yàn)結(jié)果分析,證明了該方式在稀疏場景下的有效性;2)為了實(shí)際應(yīng)用,對Mahout庫進(jìn)行了擴(kuò)展,增加了本文所研究的基于巴氏系數(shù)的協(xié)同過濾算法,并給出關(guān)鍵部分的源代碼。
[Abstract]:With the rapid development of the Internet represented by electronic commerce in recent years, the amount of data and information explosively increases, which makes it more difficult to select the target user in a large number of commodities. It is of great practical significance to study the recommendation system which plays a more and more important role in today's society. At the same time, it also provides more humanized and convenient service for the users who use it. The collaborative filtering algorithm has many successful applications in the recommendation system. However, the performance of this kind of algorithm in sparse data scene is not satisfactory. This paper starts with the basic concept of recommendation algorithm, and discusses several collaborative filtering algorithms with different similarity calculation methods. The similarity calculation method based on pasteurian coefficient is proposed, and the validity of this method is verified by experiments of Movie Lenser Netflix and Yahoo Music open source data. As a data-intensive system, recommendation system is prone to explosive growth of data. This paper also analyzes the computing principle of Hadoop distributed computing platform and the recommendation algorithm in the famous machine learning framework Mahout. It also introduces the convenience of the proposed collaborative filtering algorithm based on pasteurian coefficient. Finally, the design and implementation of the prototype of the system are given. The implementation process of the proposed similarity filtering algorithm based on pasteurian coefficient in Mahout is introduced in detail, and the source code is given. Then according to the inevitable demand of the system running for a long time, the concrete scheme and steps of migrating the system in the single-machine computing environment to the Hadoop distributed computing platform are given. This paper uses Mahout and Hadoop to solve the bottleneck of computing and storage brought about by massive data. In conclusion, the innovation of this paper is mainly reflected in the following two points: 1) aiming at the defects of collaborative filtering algorithm relying too much on common score data. In order to solve this problem, a new similarity calculation method based on pasteurian coefficient is proposed, which is used in collaborative filtering algorithm. Through the analysis of the experimental results of open source data, it is proved that this method is effective in sparse scenario. In order to practical application, the Mahout library is extended, and the cooperative filtering algorithm based on pasteurian coefficient is added, which is studied in this paper. And gives the key part of the source code.
【學(xué)位授予單位】:長江大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2016
【分類號】:TP391.3


本文編號:1596692

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/jingjilunwen/dianzishangwulunwen/1596692.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶3b6e2***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請E-mail郵箱bigeng88@qq.com