基于Hadoop面向社交網(wǎng)絡(luò)的好友推薦系統(tǒng)的研究與應(yīng)用
發(fā)布時(shí)間:2018-11-03 17:26
【摘要】:在進(jìn)入到2000年以來,互聯(lián)網(wǎng)技術(shù)迅速發(fā)展,早已深入到我們的日常生活中,像一些購物網(wǎng)站、交友網(wǎng)站、視頻網(wǎng)站每天都會(huì)有大量數(shù)據(jù)產(chǎn)生,人們面臨著一個(gè)巨大的問題就是信息過載,搜索引擎和推薦系統(tǒng)都可以很好的解決信息過載的問題。與搜索引擎不同的是,推薦系統(tǒng)不需要用戶本身主動(dòng)去手動(dòng)查詢,當(dāng)用戶本身不知道自己需要什么的時(shí)候,推薦系統(tǒng)可以根據(jù)海量數(shù)據(jù)進(jìn)行分析,挖掘出用戶的興趣愛好,發(fā)現(xiàn)有價(jià)值的內(nèi)容。在我國最著名的社交網(wǎng)站新浪微博中有著很多用戶,而這些用戶每天都在自己的微博里發(fā)表各種各樣的評(píng)論或者心情、內(nèi)容等等,我們可以從這些微博內(nèi)容里獲取用戶相關(guān)的興趣愛好,提供個(gè)性化好友推薦,在此基礎(chǔ)上,本文提出了基于Map Reduce編程模型的分布式并行化算法,設(shè)計(jì)和實(shí)現(xiàn)了一個(gè)基于Hadoop的好友推薦系統(tǒng)。主要工作內(nèi)容如下:1.重點(diǎn)研究了基于內(nèi)容的推薦算法在好友推薦系統(tǒng)中的應(yīng)用,主要研究了TF-IDF算法,并提出了TF-IDF算法的不足,在特征詞的分布方面進(jìn)行改進(jìn),最后得到改進(jìn)后的TF-DFI-DFO算法,并對(duì)TF-DFI-DFO算法和原始TF-IDF算法進(jìn)行相關(guān)實(shí)驗(yàn),對(duì)改進(jìn)后的TF-DFI-DFO算法進(jìn)行評(píng)估。2.對(duì)好友推薦系統(tǒng)的設(shè)計(jì)和實(shí)現(xiàn),分別對(duì)數(shù)據(jù)采集、數(shù)據(jù)處理和推薦決策模塊進(jìn)行詳細(xì)的分析,重點(diǎn)在推薦決策模塊里,對(duì)TF-DFI-DFO算法進(jìn)行Map Reduce分布式實(shí)現(xiàn)進(jìn)行分析。3.在Map Reduce模型下對(duì)TF-DFI-DFO算法進(jìn)行分布式實(shí)現(xiàn),然后對(duì)得到的結(jié)果建立空間向量模型,計(jì)算文本之間的相似度,最終得到推薦結(jié)果。
[Abstract]:Since the beginning of the year 2000, Internet technology has developed rapidly and has already penetrated into our daily life, such as some shopping websites, dating websites, video sites, and there are a lot of data generated every day. People face a huge problem is information overload, search engine and recommendation system can solve the problem of information overload. Unlike search engines, recommendation systems do not require users to manually query themselves. When users themselves do not know what they need, the recommendation system can be analyzed according to massive data to find out the interests of users. Find valuable content. There are a lot of users in our country's most famous social networking site, Sina Weibo, and these users make various comments, feelings, content and so on every day in their Weibo. We can get user related interests from these Weibo content and provide personalized friend recommendation. On this basis, this paper proposes a distributed parallelization algorithm based on Map Reduce programming model. A friend recommendation system based on Hadoop is designed and implemented. The main work is as follows: 1. This paper focuses on the application of content-based recommendation algorithm in friend recommendation system, mainly studies the TF-IDF algorithm, and puts forward the deficiency of TF-IDF algorithm, and improves the distribution of feature words. Finally, the improved TF-DFI-DFO algorithm is obtained, and the TF-DFI-DFO algorithm and the original TF-IDF algorithm are tested, and the improved TF-DFI-DFO algorithm is evaluated. 2. For the design and implementation of friend recommendation system, the data acquisition, data processing and recommendation decision-making module are analyzed in detail, especially in the recommendation decision module, and the distributed implementation of TF-DFI-DFO algorithm based on Map Reduce is analyzed. 3. The distributed implementation of the TF-DFI-DFO algorithm is carried out in the Map Reduce model, and then the spatial vector model is established to calculate the similarity between the texts, and finally the recommended results are obtained.
【學(xué)位授予單位】:西安工程大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2015
【分類號(hào)】:TP391.3
[Abstract]:Since the beginning of the year 2000, Internet technology has developed rapidly and has already penetrated into our daily life, such as some shopping websites, dating websites, video sites, and there are a lot of data generated every day. People face a huge problem is information overload, search engine and recommendation system can solve the problem of information overload. Unlike search engines, recommendation systems do not require users to manually query themselves. When users themselves do not know what they need, the recommendation system can be analyzed according to massive data to find out the interests of users. Find valuable content. There are a lot of users in our country's most famous social networking site, Sina Weibo, and these users make various comments, feelings, content and so on every day in their Weibo. We can get user related interests from these Weibo content and provide personalized friend recommendation. On this basis, this paper proposes a distributed parallelization algorithm based on Map Reduce programming model. A friend recommendation system based on Hadoop is designed and implemented. The main work is as follows: 1. This paper focuses on the application of content-based recommendation algorithm in friend recommendation system, mainly studies the TF-IDF algorithm, and puts forward the deficiency of TF-IDF algorithm, and improves the distribution of feature words. Finally, the improved TF-DFI-DFO algorithm is obtained, and the TF-DFI-DFO algorithm and the original TF-IDF algorithm are tested, and the improved TF-DFI-DFO algorithm is evaluated. 2. For the design and implementation of friend recommendation system, the data acquisition, data processing and recommendation decision-making module are analyzed in detail, especially in the recommendation decision module, and the distributed implementation of TF-DFI-DFO algorithm based on Map Reduce is analyzed. 3. The distributed implementation of the TF-DFI-DFO algorithm is carried out in the Map Reduce model, and then the spatial vector model is established to calculate the similarity between the texts, and finally the recommended results are obtained.
【學(xué)位授予單位】:西安工程大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2015
【分類號(hào)】:TP391.3
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 翟東海;杜佳;崔靜靜;聶洪玉;;基于雙粒度模型的中文情感特征詞提取研究[J];重慶郵電大學(xué)學(xué)報(bào)(自然科學(xué)版);2014年03期
2 李廣原;一種特征詞權(quán)重調(diào)整算法的研究[J];電腦與信息技術(shù);2005年04期
3 李德容;干靜;張s,
本文編號(hào):2308453
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2308453.html
最近更新
教材專著