基于hadoop大數(shù)據(jù)框架的個性化推薦系統(tǒng)研究與實現(xiàn)
發(fā)布時間:2018-11-22 19:42
【摘要】:信息過載問題在當(dāng)今世界越來越突出,目前有三種比較成熟的處理方法,即網(wǎng)站導(dǎo)航、搜索引擎以及推薦系統(tǒng)。網(wǎng)站導(dǎo)航通過收錄著名網(wǎng)站并分門別類的方式解決信息過載問題。而搜索引擎通過為海量網(wǎng)頁建立索引的方式解決信息過載問題。但是當(dāng)用戶不能明確表述自己的需求時,前兩者就略顯無力了,而推薦系統(tǒng)就可以解決此類問題。推薦系統(tǒng)通過分析用戶歷史行為記錄,主動為用戶推薦其潛在感興趣的內(nèi)容。但是隨著互聯(lián)網(wǎng)的高速發(fā)展,信息量也呈幾何倍數(shù)增加,傳統(tǒng)的推薦系統(tǒng)在海量數(shù)據(jù)下容易遭遇計算瓶頸。此外傳統(tǒng)推薦系統(tǒng)未充分考慮用戶興趣多變且呈現(xiàn)一定的離散性的問題。針對以上問題,本文參考以往推薦系統(tǒng)設(shè)計方案,以搜索引擎下圖書的個性化推薦系統(tǒng)為目標(biāo),研究并實現(xiàn)一種基于潛在語義分析和分片聚類的混合推薦系統(tǒng)方案。并使用hadoop大數(shù)據(jù)處理框架解決推薦系統(tǒng)海量數(shù)據(jù)處理問題。本文首先研究搜索引擎下用戶行為數(shù)據(jù)采集方法。分析搜索引擎下用戶行為類型及其特性,針對各數(shù)據(jù)類型及其特性使用不同的數(shù)據(jù)采集方式以及標(biāo)準(zhǔn)化方法,從而完成用戶行為數(shù)據(jù)采集工作。其次,針對搜索引擎下用戶行為獨特性和用戶興趣多變問題,提出潛在語義分析模型和分片聚類模型分別挖掘用戶行為大數(shù)據(jù)下的長久興趣和即時興趣。其中,潛在語義分析推薦模型以內(nèi)容進行推薦,可以緩解用戶和圖書冷啟動問題,并提升系統(tǒng)推薦的覆蓋率。而基于分片聚類的協(xié)同過濾推薦模型中的將用戶行為按屬性和內(nèi)容分片,可以抽取出用戶不同時期的興趣,從而進一步提升推薦性能,且推薦結(jié)果具有一定的新穎性。此外,針對分片聚類過程中搜索引擎下用戶相似度計算問題,提出一種基于用戶檢索詞的改進混合類型數(shù)據(jù)相似度計算方法。最后,基于Hadoop大數(shù)據(jù)處理框架研究用戶行為預(yù)處理以及推薦算法的并行化方法,完成搜索引擎下圖書的個性化推薦系統(tǒng)的設(shè)計與實現(xiàn)。通過引入Hadoop大數(shù)據(jù)處理平臺,設(shè)計并行化的推薦算法,系統(tǒng)處理海量數(shù)據(jù)的能力有很大提升。通過基于潛在語義分析的推薦模型和分片聚類的推薦模型協(xié)同作用,搜索引擎下圖書的個性化推薦精準(zhǔn)度和覆蓋率也有一定改善。最后,通過系統(tǒng)測試以及算法實驗證明其正確性。
[Abstract]:The problem of information overload is becoming more and more prominent in the world. There are three more mature methods, that is, website navigation, search engine and recommendation system. Website navigation through the collection of famous websites and classified ways to solve the problem of information overload. The search engine solves the problem of information overload by indexing massive web pages. However, when users can not express their needs clearly, the first two are slightly powerless, and recommendation system can solve such problems. The recommendation system actively recommends the content of potential interest to the user by analyzing the user's historical behavior record. However, with the rapid development of the Internet, the amount of information is increasing in geometric multiples. Traditional recommendation systems are prone to encounter computational bottlenecks under the massive data. In addition, the traditional recommendation system does not fully consider the problem that user interest is variable and present a certain degree of discreteness. In order to solve the above problems, this paper studies and implements a hybrid recommendation system based on latent semantic analysis and piecewise clustering, aiming at the personalized recommendation system of books under search engine. And use hadoop big data processing framework to solve the problem of mass data processing in recommendation system. This paper first studies the method of user behavior data acquisition under search engine. This paper analyzes the user behavior types and their characteristics under search engine, and uses different data collection methods and standardization methods according to different data types and their characteristics to complete user behavior data collection. Secondly, aiming at the problem of user behavior uniqueness and user interest variability under search engine, a latent semantic analysis model and a piecewise clustering model are proposed to mine the long-term interest and instant interest of user behavior big data respectively. Among them, the potential semantic analysis recommendation model recommends content, which can alleviate the cold start problem of users and books, and improve the coverage of system recommendation. In the collaborative filtering recommendation model based on piecewise clustering, user behavior can be segmented according to attributes and content, which can extract the interest of users in different periods, thus further improve the performance of recommendation, and the recommendation results have some novelty. In addition, an improved hybrid data similarity calculation method based on user search words is proposed to solve the problem of user similarity calculation under search engines in the process of segmented clustering. Finally, based on the Hadoop big data processing framework, the user behavior preprocessing and the parallelization of recommendation algorithm are studied, and the design and implementation of personalized recommendation system for books under search engine is completed. By introducing Hadoop big data processing platform and designing parallel recommendation algorithm, the system's ability to deal with massive data has been greatly improved. Through the collaborative effect of recommendation model based on latent semantic analysis and recommendation model based on piecewise clustering, the personalized recommendation accuracy and coverage of books under search engine are improved to some extent. Finally, it is proved to be correct by system test and algorithm experiment.
【學(xué)位授予單位】:電子科技大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2016
【分類號】:TP391.3
本文編號:2350346
[Abstract]:The problem of information overload is becoming more and more prominent in the world. There are three more mature methods, that is, website navigation, search engine and recommendation system. Website navigation through the collection of famous websites and classified ways to solve the problem of information overload. The search engine solves the problem of information overload by indexing massive web pages. However, when users can not express their needs clearly, the first two are slightly powerless, and recommendation system can solve such problems. The recommendation system actively recommends the content of potential interest to the user by analyzing the user's historical behavior record. However, with the rapid development of the Internet, the amount of information is increasing in geometric multiples. Traditional recommendation systems are prone to encounter computational bottlenecks under the massive data. In addition, the traditional recommendation system does not fully consider the problem that user interest is variable and present a certain degree of discreteness. In order to solve the above problems, this paper studies and implements a hybrid recommendation system based on latent semantic analysis and piecewise clustering, aiming at the personalized recommendation system of books under search engine. And use hadoop big data processing framework to solve the problem of mass data processing in recommendation system. This paper first studies the method of user behavior data acquisition under search engine. This paper analyzes the user behavior types and their characteristics under search engine, and uses different data collection methods and standardization methods according to different data types and their characteristics to complete user behavior data collection. Secondly, aiming at the problem of user behavior uniqueness and user interest variability under search engine, a latent semantic analysis model and a piecewise clustering model are proposed to mine the long-term interest and instant interest of user behavior big data respectively. Among them, the potential semantic analysis recommendation model recommends content, which can alleviate the cold start problem of users and books, and improve the coverage of system recommendation. In the collaborative filtering recommendation model based on piecewise clustering, user behavior can be segmented according to attributes and content, which can extract the interest of users in different periods, thus further improve the performance of recommendation, and the recommendation results have some novelty. In addition, an improved hybrid data similarity calculation method based on user search words is proposed to solve the problem of user similarity calculation under search engines in the process of segmented clustering. Finally, based on the Hadoop big data processing framework, the user behavior preprocessing and the parallelization of recommendation algorithm are studied, and the design and implementation of personalized recommendation system for books under search engine is completed. By introducing Hadoop big data processing platform and designing parallel recommendation algorithm, the system's ability to deal with massive data has been greatly improved. Through the collaborative effect of recommendation model based on latent semantic analysis and recommendation model based on piecewise clustering, the personalized recommendation accuracy and coverage of books under search engine are improved to some extent. Finally, it is proved to be correct by system test and algorithm experiment.
【學(xué)位授予單位】:電子科技大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2016
【分類號】:TP391.3
【參考文獻】
相關(guān)博士學(xué)位論文 前1條
1 孔維梁;協(xié)同過濾推薦系統(tǒng)關(guān)鍵問題研究[D];華中師范大學(xué);2013年
,本文編號:2350346
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2350346.html
最近更新
教材專著