K-means算法性能改進及在電影推薦系統(tǒng)中的應用研究
[Abstract]:With the rapid development and popularization of Internet technology, huge amounts of data information have been generated. Clustering analysis of data can produce great commercial value. Therefore, K-means algorithm has been widely studied and applied. Because the data of clustering mining generally presents the characteristics of sea quantization and sparsity, the traditional K-means algorithm, because of its running mechanism and computing strategy, is prone to the problem of memory overflow when dealing with the above mentioned sea quantization data. In order to solve the problem of efficiency of K-means algorithm, a parallel sampling K-means algorithm is proposed, but the clustering effect is unstable and the number of iterations is too many. This paper focuses on the performance improvement of parallel sampling K-means algorithm and its application in practical recommendation systems. The specific research work includes: firstly, an improved parallel sampling K-means algorithm (IPSK (Improved Parallel Sampling K-means) is proposed, which takes multiple samples from the whole data set in parallel. The initial cluster center of each sample is calculated, and the sample initial cluster center with good quality is selected, and all the sample clustering centers after clustering are stored in a cluster center matrix, and the points in the matrix are clustered. The cluster center is then used as the initial cluster center of the cluster population data set. Experimental results show that the algorithm makes the initial clustering center more representative and weakens the sensitivity of the algorithm to the initial clustering center. It has good accuracy and stability for big data clustering. Secondly, the IPSK algorithm is introduced into the user-based collaborative filtering recommendation algorithm, and the user clustering collaborative filtering recommendation algorithm (IPSK-UCF) based on IPSK is designed. Finally, a movie recommendation system is designed and implemented, and the application of IPSK-UCF algorithm in the actual recommendation system is explored. The system can find out the interest preference of users and recommend interesting movies to users by scoring the movies and browsing the history of the users. This paper describes the design and implementation of the system in detail, and shows the effect of the system.
【學位授予單位】:西安理工大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP391.3
【參考文獻】
相關(guān)期刊論文 前10條
1 楊勇;任淑霞;冉娟;李春青;;基于粒子群優(yōu)化的k-means改進算法實現(xiàn)Web日志挖掘[J];計算機應用;2016年S1期
2 周潤物;李智勇;陳少淼;陳京;李仁發(fā);;面向大數(shù)據(jù)處理的并行優(yōu)化抽樣聚類K-means算法[J];計算機應用;2016年02期
3 王永貴;武超;戴偉;;基于MapReduce的隨機抽樣K-means算法[J];計算機工程與應用;2016年08期
4 楊森;;聚類分析及其應用研究[J];計算機安全;2014年01期
5 曹永春;蔡正琦;邵亞斌;;基于K-means的改進人工蜂群聚類算法[J];計算機應用;2014年01期
6 孫海峰;甘明鑫;劉鑫;吳越;;國外電影推薦系統(tǒng)網(wǎng)站研究與評述[J];計算機應用;2013年S2期
7 宛婉;周國祥;;Hadoop平臺的海量數(shù)據(jù)并行隨機抽樣[J];計算機工程與應用;2014年20期
8 江小平;李成華;向文;張新訪;顏海濤;;k-means聚類算法的MapReduce并行化實現(xiàn)[J];華中科技大學學報(自然科學版);2011年S1期
9 程苗;陳華平;;基于Hadoop的Web日志挖掘[J];計算機工程;2011年11期
10 傅德勝;周辰;;基于密度的改進K均值算法及實現(xiàn)[J];計算機應用;2011年02期
相關(guān)碩士學位論文 前3條
1 汪宇;基于k-means用戶聚類的混合協(xié)同過濾算法的研究[D];吉林大學;2016年
2 夏冬;基于聚類的電子商務推薦系統(tǒng)研究[D];華東師范大學;2015年
3 雷震;基于聚類的個性化推薦算法研究[D];電子科技大學;2013年
,本文編號:2373709
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2373709.html