基于hadoop大數(shù)據(jù)框架的個(gè)性化推薦系統(tǒng)研究與實(shí)現(xiàn)

發(fā)布時(shí)間：2018-11-22 19:42

【摘要】：信息過(guò)載問(wèn)題在當(dāng)今世界越來(lái)越突出,目前有三種比較成熟的處理方法,即網(wǎng)站導(dǎo)航、搜索引擎以及推薦系統(tǒng)。網(wǎng)站導(dǎo)航通過(guò)收錄著名網(wǎng)站并分門(mén)別類(lèi)的方式解決信息過(guò)載問(wèn)題。而搜索引擎通過(guò)為海量網(wǎng)頁(yè)建立索引的方式解決信息過(guò)載問(wèn)題。但是當(dāng)用戶不能明確表述自己的需求時(shí),前兩者就略顯無(wú)力了,而推薦系統(tǒng)就可以解決此類(lèi)問(wèn)題。推薦系統(tǒng)通過(guò)分析用戶歷史行為記錄,主動(dòng)為用戶推薦其潛在感興趣的內(nèi)容。但是隨著互聯(lián)網(wǎng)的高速發(fā)展,信息量也呈幾何倍數(shù)增加,傳統(tǒng)的推薦系統(tǒng)在海量數(shù)據(jù)下容易遭遇計(jì)算瓶頸。此外傳統(tǒng)推薦系統(tǒng)未充分考慮用戶興趣多變且呈現(xiàn)一定的離散性的問(wèn)題。針對(duì)以上問(wèn)題,本文參考以往推薦系統(tǒng)設(shè)計(jì)方案,以搜索引擎下圖書(shū)的個(gè)性化推薦系統(tǒng)為目標(biāo),研究并實(shí)現(xiàn)一種基于潛在語(yǔ)義分析和分片聚類(lèi)的混合推薦系統(tǒng)方案。并使用hadoop大數(shù)據(jù)處理框架解決推薦系統(tǒng)海量數(shù)據(jù)處理問(wèn)題。本文首先研究搜索引擎下用戶行為數(shù)據(jù)采集方法。分析搜索引擎下用戶行為類(lèi)型及其特性,針對(duì)各數(shù)據(jù)類(lèi)型及其特性使用不同的數(shù)據(jù)采集方式以及標(biāo)準(zhǔn)化方法,從而完成用戶行為數(shù)據(jù)采集工作。其次,針對(duì)搜索引擎下用戶行為獨(dú)特性和用戶興趣多變問(wèn)題,提出潛在語(yǔ)義分析模型和分片聚類(lèi)模型分別挖掘用戶行為大數(shù)據(jù)下的長(zhǎng)久興趣和即時(shí)興趣。其中,潛在語(yǔ)義分析推薦模型以內(nèi)容進(jìn)行推薦,可以緩解用戶和圖書(shū)冷啟動(dòng)問(wèn)題,并提升系統(tǒng)推薦的覆蓋率。而基于分片聚類(lèi)的協(xié)同過(guò)濾推薦模型中的將用戶行為按屬性和內(nèi)容分片,可以抽取出用戶不同時(shí)期的興趣,從而進(jìn)一步提升推薦性能,且推薦結(jié)果具有一定的新穎性。此外,針對(duì)分片聚類(lèi)過(guò)程中搜索引擎下用戶相似度計(jì)算問(wèn)題,提出一種基于用戶檢索詞的改進(jìn)混合類(lèi)型數(shù)據(jù)相似度計(jì)算方法。最后,基于Hadoop大數(shù)據(jù)處理框架研究用戶行為預(yù)處理以及推薦算法的并行化方法,完成搜索引擎下圖書(shū)的個(gè)性化推薦系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)。通過(guò)引入Hadoop大數(shù)據(jù)處理平臺(tái),設(shè)計(jì)并行化的推薦算法,系統(tǒng)處理海量數(shù)據(jù)的能力有很大提升。通過(guò)基于潛在語(yǔ)義分析的推薦模型和分片聚類(lèi)的推薦模型協(xié)同作用,搜索引擎下圖書(shū)的個(gè)性化推薦精準(zhǔn)度和覆蓋率也有一定改善。最后,通過(guò)系統(tǒng)測(cè)試以及算法實(shí)驗(yàn)證明其正確性。
[Abstract]:The problem of information overload is becoming more and more prominent in the world. There are three more mature methods, that is, website navigation, search engine and recommendation system. Website navigation through the collection of famous websites and classified ways to solve the problem of information overload. The search engine solves the problem of information overload by indexing massive web pages. However, when users can not express their needs clearly, the first two are slightly powerless, and recommendation system can solve such problems. The recommendation system actively recommends the content of potential interest to the user by analyzing the user's historical behavior record. However, with the rapid development of the Internet, the amount of information is increasing in geometric multiples. Traditional recommendation systems are prone to encounter computational bottlenecks under the massive data. In addition, the traditional recommendation system does not fully consider the problem that user interest is variable and present a certain degree of discreteness. In order to solve the above problems, this paper studies and implements a hybrid recommendation system based on latent semantic analysis and piecewise clustering, aiming at the personalized recommendation system of books under search engine. And use hadoop big data processing framework to solve the problem of mass data processing in recommendation system. This paper first studies the method of user behavior data acquisition under search engine. This paper analyzes the user behavior types and their characteristics under search engine, and uses different data collection methods and standardization methods according to different data types and their characteristics to complete user behavior data collection. Secondly, aiming at the problem of user behavior uniqueness and user interest variability under search engine, a latent semantic analysis model and a piecewise clustering model are proposed to mine the long-term interest and instant interest of user behavior big data respectively. Among them, the potential semantic analysis recommendation model recommends content, which can alleviate the cold start problem of users and books, and improve the coverage of system recommendation. In the collaborative filtering recommendation model based on piecewise clustering, user behavior can be segmented according to attributes and content, which can extract the interest of users in different periods, thus further improve the performance of recommendation, and the recommendation results have some novelty. In addition, an improved hybrid data similarity calculation method based on user search words is proposed to solve the problem of user similarity calculation under search engines in the process of segmented clustering. Finally, based on the Hadoop big data processing framework, the user behavior preprocessing and the parallelization of recommendation algorithm are studied, and the design and implementation of personalized recommendation system for books under search engine is completed. By introducing Hadoop big data processing platform and designing parallel recommendation algorithm, the system's ability to deal with massive data has been greatly improved. Through the collaborative effect of recommendation model based on latent semantic analysis and recommendation model based on piecewise clustering, the personalized recommendation accuracy and coverage of books under search engine are improved to some extent. Finally, it is proved to be correct by system test and algorithm experiment.
【學(xué)位授予單位】：電子科技大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2016
【分類(lèi)號(hào)】：TP391.3

【參考文獻(xiàn)】

相關(guān)博士學(xué)位論文前1條

1 孔維梁;協(xié)同過(guò)濾推薦系統(tǒng)關(guān)鍵問(wèn)題研究[D];華中師范大學(xué);2013年

，

本文編號(hào)：2350346

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2350346.html

上一篇：檢索式學(xué)習(xí):意義、方式與發(fā)展
下一篇：簽完合同就“失聯(lián)”馬可波羅網(wǎng)誠(chéng)信去哪兒了

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于hadoop大數(shù)據(jù)框架的個(gè)性化推薦系統(tǒng)研究與實(shí)現(xiàn)