協(xié)同過(guò)濾推薦算法研究及MapReduce實(shí)現(xiàn)
發(fā)布時(shí)間:2019-03-06 10:00
【摘要】:隨著互聯(lián)網(wǎng)技術(shù)的高速發(fā)展,數(shù)據(jù)信息呈現(xiàn)出爆炸式增長(zhǎng),互聯(lián)網(wǎng)將人類帶入了大數(shù)據(jù)時(shí)代。用戶要在海量數(shù)據(jù)中挑選出自己真正需要的信息好比大海撈針,如何在眾多信息中迅速挖掘用戶感興趣的關(guān)鍵信息并推送給用戶,成為當(dāng)下學(xué)界和業(yè)界共同關(guān)注的熱點(diǎn)問(wèn)題。近年來(lái),推薦系統(tǒng)作為一種智能的個(gè)性化信息服務(wù)技術(shù)在國(guó)內(nèi)外得到迅速崛起,并在電子商務(wù)、視頻娛樂(lè)、社交網(wǎng)絡(luò)等多個(gè)領(lǐng)域得到廣泛應(yīng)用。經(jīng)過(guò)多年的發(fā)展,推薦系統(tǒng)已經(jīng)衍生出基于內(nèi)容的推薦、基于數(shù)據(jù)挖掘的推薦、協(xié)同過(guò)濾推薦等多種推薦技術(shù)。其中,協(xié)同過(guò)濾推薦技術(shù)是應(yīng)用最為廣泛的推薦技術(shù)。但是,協(xié)同過(guò)濾推薦算法存在著數(shù)據(jù)稀疏、推薦精度低等問(wèn)題,特別在大數(shù)據(jù)背景下,協(xié)同過(guò)濾推薦算法的數(shù)據(jù)稀疏問(wèn)題、推薦精度問(wèn)題被進(jìn)一步放大,使之成為推薦系統(tǒng)的發(fā)展和應(yīng)用的瓶頸;诖,本文完成了如下工作:第一,針對(duì)協(xié)同過(guò)濾推薦系統(tǒng)中的數(shù)據(jù)稀疏性問(wèn)題,提出了基于專家用戶和項(xiàng)目信任度的數(shù)據(jù)填充方法。該方法根據(jù)專家信任度值,選擇評(píng)分?jǐn)?shù)量多、評(píng)分質(zhì)量好的用戶作為專家用戶。同時(shí),該方法綜合考慮項(xiàng)目評(píng)分?jǐn)?shù)和標(biāo)準(zhǔn)差作為項(xiàng)目信任度的評(píng)估值,使信任度高的項(xiàng)目作為可行項(xiàng)目,并采用專家用戶的評(píng)分對(duì)高信任度項(xiàng)目的缺失項(xiàng)進(jìn)行填充,從而在保證填充質(zhì)量的前提下有效降低數(shù)據(jù)的稀疏度,并通過(guò)實(shí)驗(yàn)驗(yàn)證該算法的有效性。第二,結(jié)合K-Means算法和基于項(xiàng)目的協(xié)同過(guò)濾推薦算法,提出了基于聚類和非對(duì)稱權(quán)重混合相似度的協(xié)同過(guò)濾推薦算法(CFCA)。該算法首先完成了基于評(píng)分穩(wěn)定項(xiàng)目的K-Means聚類,然后在類中采用非對(duì)稱權(quán)重混合相似度進(jìn)行相似度計(jì)算,并據(jù)此給出推薦結(jié)果。該算法綜合考慮項(xiàng)目之間共同用戶評(píng)分的交疊狀況和項(xiàng)目的評(píng)分?jǐn)?shù),提高了相似度計(jì)算的準(zhǔn)確性,進(jìn)而提高推薦質(zhì)量。針對(duì)本文提出的算法,論文完成了在不同條件下CFCA算法與傳統(tǒng)協(xié)同過(guò)濾推薦算法的實(shí)驗(yàn)對(duì)比。實(shí)驗(yàn)結(jié)果表明,本文提出的算法,能夠有效的提高算法的推薦精度。第三,為提高算法效率、降低算法運(yùn)算時(shí)間,本文設(shè)計(jì)了CFCA算法MapReduce并行編程模型,并完成了該模型下數(shù)據(jù)預(yù)處理、基于評(píng)分穩(wěn)定項(xiàng)目的K-Means聚類、基于非對(duì)稱權(quán)重混合相似度計(jì)算和預(yù)測(cè)評(píng)分階段的并行化處理。通過(guò)并行運(yùn)算解決了算法處理的效率問(wèn)題。
[Abstract]:With the rapid development of Internet technology, the data and information shows explosive growth. The Internet has brought human beings into the era of big data. Users want to pick out the information they really need from the massive data is like looking for a needle in a haystack. How to quickly mine the key information that the user is interested in and push it to the user among the numerous information becomes a hot issue which is concerned by both the academic circles and the industry at present. In recent years, as an intelligent personalized information service technology, recommendation system has emerged rapidly at home and abroad, and has been widely used in many fields such as e-commerce, video entertainment, social network and so on. After years of development, the recommendation system has derived a variety of recommendation technologies, such as content-based recommendation, data mining-based recommendation, collaborative filtering recommendation and so on. Among them, collaborative filtering recommendation technology is the most widely used recommendation technology. However, the collaborative filtering recommendation algorithm has some problems such as sparse data and low recommendation precision, especially in the background of big data, the data sparse problem of collaborative filtering recommendation algorithm, and the recommendation accuracy problem is further enlarged. Make it become the bottleneck of the development and application of recommendation system. In order to solve the problem of data sparsity in collaborative filtering recommendation system, a data filling method based on expert users and project trust is proposed in this paper. According to the trust value of experts, this method selects users with many scores and good quality as expert users. At the same time, the method comprehensively considers the project score and standard deviation as the evaluation value of the project trust, makes the project with high trust as a feasible item, and uses the score of the expert user to fill the missing item of the high trust item. As a result, the sparsity of the data is effectively reduced under the premise of ensuring the filling quality, and the effectiveness of the algorithm is verified by experiments. Secondly, combining the K-Means algorithm and the item-based collaborative filtering recommendation algorithm, a collaborative filtering recommendation algorithm (CFCA). Based on the mixed similarity of clustering and asymmetric weights is proposed. The algorithm first completes the K-Means clustering based on the score-stable items, and then computes the similarity degree by using the mixed similarity degree of asymmetric weights in the class, and then gives the recommended result. This algorithm considers the overlap of common user scores between items and the score of items, improves the accuracy of similarity calculation, and then improves the quality of recommendation. According to the algorithm proposed in this paper, the CFCA algorithm is compared with the traditional collaborative filtering recommendation algorithm under different conditions. The experimental results show that the proposed algorithm can effectively improve the recommendation accuracy of the algorithm. Thirdly, in order to improve the efficiency of the algorithm and reduce the operation time, this paper designs the CFCA algorithm MapReduce parallel programming model, and completes the data preprocessing under the model, and the K-Means clustering based on the grading stable item. Parallel processing based on mixed similarity calculation of asymmetric weights and prediction scoring stage. The efficiency of the algorithm is solved by parallel operation.
【學(xué)位授予單位】:四川師范大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2016
【分類號(hào)】:TP391.3
本文編號(hào):2435426
[Abstract]:With the rapid development of Internet technology, the data and information shows explosive growth. The Internet has brought human beings into the era of big data. Users want to pick out the information they really need from the massive data is like looking for a needle in a haystack. How to quickly mine the key information that the user is interested in and push it to the user among the numerous information becomes a hot issue which is concerned by both the academic circles and the industry at present. In recent years, as an intelligent personalized information service technology, recommendation system has emerged rapidly at home and abroad, and has been widely used in many fields such as e-commerce, video entertainment, social network and so on. After years of development, the recommendation system has derived a variety of recommendation technologies, such as content-based recommendation, data mining-based recommendation, collaborative filtering recommendation and so on. Among them, collaborative filtering recommendation technology is the most widely used recommendation technology. However, the collaborative filtering recommendation algorithm has some problems such as sparse data and low recommendation precision, especially in the background of big data, the data sparse problem of collaborative filtering recommendation algorithm, and the recommendation accuracy problem is further enlarged. Make it become the bottleneck of the development and application of recommendation system. In order to solve the problem of data sparsity in collaborative filtering recommendation system, a data filling method based on expert users and project trust is proposed in this paper. According to the trust value of experts, this method selects users with many scores and good quality as expert users. At the same time, the method comprehensively considers the project score and standard deviation as the evaluation value of the project trust, makes the project with high trust as a feasible item, and uses the score of the expert user to fill the missing item of the high trust item. As a result, the sparsity of the data is effectively reduced under the premise of ensuring the filling quality, and the effectiveness of the algorithm is verified by experiments. Secondly, combining the K-Means algorithm and the item-based collaborative filtering recommendation algorithm, a collaborative filtering recommendation algorithm (CFCA). Based on the mixed similarity of clustering and asymmetric weights is proposed. The algorithm first completes the K-Means clustering based on the score-stable items, and then computes the similarity degree by using the mixed similarity degree of asymmetric weights in the class, and then gives the recommended result. This algorithm considers the overlap of common user scores between items and the score of items, improves the accuracy of similarity calculation, and then improves the quality of recommendation. According to the algorithm proposed in this paper, the CFCA algorithm is compared with the traditional collaborative filtering recommendation algorithm under different conditions. The experimental results show that the proposed algorithm can effectively improve the recommendation accuracy of the algorithm. Thirdly, in order to improve the efficiency of the algorithm and reduce the operation time, this paper designs the CFCA algorithm MapReduce parallel programming model, and completes the data preprocessing under the model, and the K-Means clustering based on the grading stable item. Parallel processing based on mixed similarity calculation of asymmetric weights and prediction scoring stage. The efficiency of the algorithm is solved by parallel operation.
【學(xué)位授予單位】:四川師范大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2016
【分類號(hào)】:TP391.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前1條
1 馬宏偉;張光衛(wèi);李鵬;;協(xié)同過(guò)濾推薦算法綜述[J];小型微型計(jì)算機(jī)系統(tǒng);2009年07期
,本文編號(hào):2435426
本文鏈接:http://sikaile.net/jingjilunwen/dianzishangwulunwen/2435426.html
最近更新
教材專著