基于Spark的混合推薦系統(tǒng)
本文選題:混合推薦 切入點(diǎn):Spark 出處:《中國(guó)科學(xué)技術(shù)大學(xué)》2017年碩士論文
【摘要】:隨著信息技術(shù)的快速發(fā)展,信息過(guò)載已經(jīng)成為互聯(lián)網(wǎng)領(lǐng)域面臨的重要挑戰(zhàn)。為了緩解互聯(lián)網(wǎng)用戶與海量數(shù)據(jù)間日益加劇的矛盾,研究人員提出了推薦系統(tǒng)的概念。作為推薦系統(tǒng)的一個(gè)重要分支,混合推薦系統(tǒng)通過(guò)組合多種推薦算法提高系統(tǒng)性能,目前廣泛應(yīng)用于電子商務(wù)、社交網(wǎng)絡(luò)和視頻網(wǎng)站等領(lǐng)域。然而,用戶量與數(shù)據(jù)量的急速增長(zhǎng)對(duì)混合推薦系統(tǒng)的性能提出了更高的要求。例如,視頻網(wǎng)站要求混合推薦系統(tǒng)為用戶精準(zhǔn)推薦各類視頻,并根據(jù)用戶行為的變化訓(xùn)練新的模型,及時(shí)更新推薦結(jié)果。由于數(shù)據(jù)量的增加,開發(fā)人員難以利用經(jīng)驗(yàn)確定各推薦算法對(duì)最終結(jié)果的影響程度。因此,粗粒度權(quán)重計(jì)算方法影響混合推薦系統(tǒng)的精度,增加開發(fā)難度。此外,由于系統(tǒng)基于大規(guī)模數(shù)據(jù)訓(xùn)練特征模型,訓(xùn)練過(guò)程包含大量迭代計(jì)算,使得訓(xùn)練一次模型的時(shí)間為一天甚至幾天,難以滿足用戶對(duì)推薦系統(tǒng)效率的需求。本文通過(guò)分析不同的數(shù)據(jù)集、推薦算法以及權(quán)重計(jì)算方法的特點(diǎn),引入適用于迭代計(jì)算的通用大規(guī)模數(shù)據(jù)處理平臺(tái)Spark,設(shè)計(jì)并實(shí)現(xiàn)了基于Spark的混合推薦系統(tǒng),以提高推薦系統(tǒng)的精度、多樣性和效率。本文的主要工作及創(chuàng)新點(diǎn)如下:1.首先,本文提出一種細(xì)粒度權(quán)重計(jì)算方法,將各推薦算法的權(quán)值擴(kuò)展為權(quán)重向量。該方法提高了評(píng)分預(yù)測(cè)推薦的精度,并有效緩解數(shù)據(jù)稀疏帶來(lái)的冷啟動(dòng)問(wèn)題:2.其次,本文基于大規(guī)模數(shù)據(jù)處理框架Spark,以細(xì)粒度權(quán)重計(jì)算方法為核心,設(shè)計(jì)實(shí)現(xiàn)細(xì)粒度權(quán)重混合子系統(tǒng)。該子系統(tǒng)基于分布式計(jì)算框架Spark降低模型訓(xùn)練時(shí)間,并利用細(xì)粒度權(quán)重計(jì)算方法提高推薦精度。實(shí)驗(yàn)結(jié)果表明,細(xì)粒度權(quán)重混合推薦比單一推薦算法的精度提高5%~30%,比粗粒度權(quán)重混合推薦的精度提高1.5%~3%。同時(shí),該系統(tǒng)的模型訓(xùn)練速度比單機(jī)推薦系統(tǒng)提高了 90%,比基于Hadoop框架的推薦系統(tǒng)的訓(xùn)練時(shí)間提高了 2倍左右;3.最后,本文設(shè)計(jì)實(shí)現(xiàn)基于Spark的交叉調(diào)和推薦系統(tǒng)。該系統(tǒng)以細(xì)粒度權(quán)重混合子系統(tǒng)為核心,引入基于內(nèi)容的推薦算法,實(shí)現(xiàn)了一個(gè)高精度、高效率、多樣性和可擴(kuò)展的混合推薦系統(tǒng)。
[Abstract]:With the rapid development of information technology, information overload has become an important challenge in the field of Internet. Researchers put forward the concept of recommendation system. As an important branch of recommendation system, hybrid recommendation system improves system performance by combining multiple recommendation algorithms, and is widely used in electronic commerce. However, the rapid growth in the number of users and the amount of data put higher demands on the performance of hybrid recommendation systems. For example, video sites require hybrid recommendation systems to recommend all kinds of videos to users accurately. According to the change of user behavior, the new model is trained to update the recommended results in time. Because of the increase of data volume, it is difficult for developers to use experience to determine the impact of each recommendation algorithm on the final result. The coarse-grained weight calculation method affects the precision of the hybrid recommendation system and makes it more difficult to develop. In addition, because the system is based on the large-scale data training feature model, the training process includes a large number of iterative calculations. This paper analyzes the characteristics of different data sets, recommendation algorithms and weight calculation methods, because the training time of a model is one day or even a few days, and it is difficult to meet the needs of users for the efficiency of recommendation system. A universal large-scale data processing platform, Spark, which is suitable for iterative computation, is introduced, and a hybrid recommendation system based on Spark is designed and implemented in order to improve the accuracy, diversity and efficiency of the recommendation system. The main work and innovations of this paper are as follows: 1. In this paper, a fine-grained weight calculation method is proposed, in which the weight of each recommendation algorithm is extended to a weight vector. This method improves the accuracy of prediction recommendation and effectively alleviates the cold start problem: 2. 2, which is caused by sparse data. Based on Spark-based large-scale data processing framework, a hybrid fine-grained weight subsystem is designed and implemented with fine-grained weight calculation method as the core. The subsystem is based on the distributed computing framework Spark to reduce the training time of the model. The experimental results show that the precision of the hybrid recommendation is increased by 5% than that of the single recommendation algorithm, and the accuracy of the mixed recommendation is 1.5% higher than that of the coarse-grained weight. The model training speed of the system is 90 times faster than that of the single machine recommendation system, and the training time of the recommendation system based on the Hadoop framework is about 2 times higher than that of the single machine recommendation system. In this paper, we design and implement a hybrid recommendation system based on Spark, which is based on the fine-grained weight hybrid subsystem, and introduces the content-based recommendation algorithm to realize a hybrid recommendation system with high precision, high efficiency, diversity and expansibility.
【學(xué)位授予單位】:中國(guó)科學(xué)技術(shù)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP391.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前2條
1 化柏林;Google搜索引擎技術(shù)實(shí)現(xiàn)探究[J];現(xiàn)代圖書情報(bào)技術(shù);2004年S1期
2 陳笑輝,范曉虹;Yahoo的分類體系結(jié)構(gòu)及原理探微[J];圖書情報(bào)工作;1999年09期
相關(guān)碩士學(xué)位論文 前6條
1 葉敬寧;引入策略偏好的個(gè)性化推薦技術(shù)研究[D];東南大學(xué);2016年
2 王峰;基于新浪微博輿情采集與傾向性分析系統(tǒng)[D];南京信息工程大學(xué);2016年
3 宋光曉;基于Mahout、Hadoop的推薦系統(tǒng)研究與實(shí)現(xiàn)[D];長(zhǎng)江大學(xué);2016年
4 聶帥華;基于內(nèi)容推薦/協(xié)同過(guò)濾推薦算法的智能交友網(wǎng)站的設(shè)計(jì)&實(shí)現(xiàn)[D];華中師范大學(xué);2015年
5 楊卓犖;數(shù)據(jù)倉(cāng)庫(kù)分布式列存儲(chǔ)技術(shù)研究與實(shí)現(xiàn)[D];昆明理工大學(xué);2012年
6 王麗莎;基于隨機(jī)游走模型的個(gè)性化信息推薦[D];大連理工大學(xué);2011年
,本文編號(hào):1692361
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1692361.html