K-means算法性能改進(jìn)及在電影推薦系統(tǒng)中的應(yīng)用研究

發(fā)布時(shí)間：2018-12-12 02:17

【摘要】：隨著互聯(lián)網(wǎng)技術(shù)的蓬勃發(fā)展及普及應(yīng)用,產(chǎn)生了海量的數(shù)據(jù)信息,對(duì)數(shù)據(jù)進(jìn)行聚類分析能夠產(chǎn)生巨大的商業(yè)價(jià)值,因此,K-means算法受到廣泛的研究和應(yīng)用。由于聚類挖掘的數(shù)據(jù)一般都呈現(xiàn)海量化、稀疏化的特性,傳統(tǒng)K-means算法因其運(yùn)行機(jī)制及計(jì)算策略,在處理上述海量化數(shù)據(jù)時(shí)極易出現(xiàn)內(nèi)存溢出問題。針對(duì)K-means算法在效率方面存在的問題,業(yè)內(nèi)學(xué)者提出并行抽樣K-means算法,但該算法卻仍存在聚類效果不穩(wěn)定和迭代次數(shù)過多的問題。本文的研究工作針對(duì)并行抽樣K-means算法的性能改進(jìn)以及在實(shí)際推薦系統(tǒng)中的應(yīng)用展開。具體的研究工作包括:首先,研究提出了一種改進(jìn)的并行抽樣K-means算法IPSK(Improved Parallel Sampling K-means),該算法從總體數(shù)據(jù)集中并行化抽取多個(gè)樣本,對(duì)每個(gè)樣本進(jìn)行初始聚類中心計(jì)算,選取質(zhì)量較好的樣本初始聚類中心,并把所有聚類后的樣本聚類中心存入到一個(gè)聚類中心矩陣中,對(duì)矩陣中的點(diǎn)進(jìn)行聚類,將聚類得到的聚類中心再作為聚類總體數(shù)據(jù)集的初始聚類中心。實(shí)驗(yàn)表明,本算法對(duì)樣本初始聚類中心的計(jì)算方式使得樣本初始聚類中心更具有代表性,減弱了算法對(duì)初始聚類中心的敏感程度,在面向大數(shù)據(jù)聚類時(shí)具有很好的準(zhǔn)確性和穩(wěn)定性;其次,將IPSK算法引入到基于用戶的協(xié)同過濾推薦算法中,設(shè)計(jì)了基于IPSK的用戶聚類協(xié)同過濾推薦算法(IPSK-UCF);最后,設(shè)計(jì)并實(shí)現(xiàn)了一個(gè)電影推薦系統(tǒng),探索了 IPSK-UCF算法在實(shí)際推薦系統(tǒng)中的應(yīng)用問題。該系統(tǒng)能夠通過用戶對(duì)電影的評(píng)分和用戶的歷史瀏覽記錄,發(fā)現(xiàn)用戶的興趣偏好,為用戶推薦感興趣的電影。論文詳細(xì)說明了該系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)方法,并展示了系統(tǒng)的實(shí)現(xiàn)效果。
[Abstract]:With the rapid development and popularization of Internet technology, huge amounts of data information have been generated. Clustering analysis of data can produce great commercial value. Therefore, K-means algorithm has been widely studied and applied. Because the data of clustering mining generally presents the characteristics of sea quantization and sparsity, the traditional K-means algorithm, because of its running mechanism and computing strategy, is prone to the problem of memory overflow when dealing with the above mentioned sea quantization data. In order to solve the problem of efficiency of K-means algorithm, a parallel sampling K-means algorithm is proposed, but the clustering effect is unstable and the number of iterations is too many. This paper focuses on the performance improvement of parallel sampling K-means algorithm and its application in practical recommendation systems. The specific research work includes: firstly, an improved parallel sampling K-means algorithm (IPSK (Improved Parallel Sampling K-means) is proposed, which takes multiple samples from the whole data set in parallel. The initial cluster center of each sample is calculated, and the sample initial cluster center with good quality is selected, and all the sample clustering centers after clustering are stored in a cluster center matrix, and the points in the matrix are clustered. The cluster center is then used as the initial cluster center of the cluster population data set. Experimental results show that the algorithm makes the initial clustering center more representative and weakens the sensitivity of the algorithm to the initial clustering center. It has good accuracy and stability for big data clustering. Secondly, the IPSK algorithm is introduced into the user-based collaborative filtering recommendation algorithm, and the user clustering collaborative filtering recommendation algorithm (IPSK-UCF) based on IPSK is designed. Finally, a movie recommendation system is designed and implemented, and the application of IPSK-UCF algorithm in the actual recommendation system is explored. The system can find out the interest preference of users and recommend interesting movies to users by scoring the movies and browsing the history of the users. This paper describes the design and implementation of the system in detail, and shows the effect of the system.
【學(xué)位授予單位】：西安理工大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2017
【分類號(hào)】：TP391.3

【參考文獻(xiàn)】

相關(guān)期刊論文前10條

1 楊勇;任淑霞;冉娟;李春青;;基于粒子群優(yōu)化的k-means改進(jìn)算法實(shí)現(xiàn)Web日志挖掘[J];計(jì)算機(jī)應(yīng)用;2016年S1期

2 周潤(rùn)物;李智勇;陳少淼;陳京;李仁發(fā);;面向大數(shù)據(jù)處理的并行優(yōu)化抽樣聚類K-means算法[J];計(jì)算機(jī)應(yīng)用;2016年02期

3 王永貴;武超;戴偉;;基于MapReduce的隨機(jī)抽樣K-means算法[J];計(jì)算機(jī)工程與應(yīng)用;2016年08期

4 楊森;;聚類分析及其應(yīng)用研究[J];計(jì)算機(jī)安全;2014年01期

5 曹永春;蔡正琦;邵亞斌;;基于K-means的改進(jìn)人工蜂群聚類算法[J];計(jì)算機(jī)應(yīng)用;2014年01期

6 孫海峰;甘明鑫;劉鑫;吳越;;國(guó)外電影推薦系統(tǒng)網(wǎng)站研究與評(píng)述[J];計(jì)算機(jī)應(yīng)用;2013年S2期

7 宛婉;周國(guó)祥;;Hadoop平臺(tái)的海量數(shù)據(jù)并行隨機(jī)抽樣[J];計(jì)算機(jī)工程與應(yīng)用;2014年20期

8 江小平;李成華;向文;張新訪;顏海濤;;k-means聚類算法的MapReduce并行化實(shí)現(xiàn)[J];華中科技大學(xué)學(xué)報(bào)(自然科學(xué)版);2011年S1期

9 程苗;陳華平;;基于Hadoop的Web日志挖掘[J];計(jì)算機(jī)工程;2011年11期

10 傅德勝;周辰;;基于密度的改進(jìn)K均值算法及實(shí)現(xiàn)[J];計(jì)算機(jī)應(yīng)用;2011年02期

相關(guān)碩士學(xué)位論文前3條

1 汪宇;基于k-means用戶聚類的混合協(xié)同過濾算法的研究[D];吉林大學(xué);2016年

2 夏冬;基于聚類的電子商務(wù)推薦系統(tǒng)研究[D];華東師范大學(xué);2015年

3 雷震;基于聚類的個(gè)性化推薦算法研究[D];電子科技大學(xué);2013年

，

本文編號(hào)：2373709

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2373709.html

上一篇：視頻監(jiān)控報(bào)警管理系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)
下一篇：基于微博的細(xì)粒度情感分析

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

K-means算法性能改進(jìn)及在電影推薦系統(tǒng)中的應(yīng)用研究