當(dāng)前位置：主頁 > 社科論文 > 社會(huì)學(xué)論文 >

基于網(wǎng)格密度區(qū)分的多維聚類挖掘算法設(shè)計(jì)

發(fā)布時(shí)間：2018-03-19 10:28

本文選題：聚類算法　切入點(diǎn)：網(wǎng)格　出處：《西安財(cái)經(jīng)學(xué)院》2014年碩士論文　論文類型：學(xué)位論文

【摘要】：聚類分析為數(shù)據(jù)挖掘算法的重要組成部分，是數(shù)據(jù)挖掘中的一種分析活動(dòng)。聚類算法是總體聚類分析的核心，決定了全部聚類分析結(jié)果的質(zhì)量。目前，如何在保證算法穩(wěn)定與有效的前提下，進(jìn)一步提高聚類效率，，減少用戶成本和負(fù)擔(dān)，成為當(dāng)前非常有意義的研究方向。由于傳統(tǒng)的聚類算法對(duì)計(jì)算機(jī)硬件資源要求比較高，海量數(shù)據(jù)聚類運(yùn)算時(shí)間比較長，本文提出了一種新的基于網(wǎng)格和密度的聚類算法。一般基于網(wǎng)格的聚類具有節(jié)省時(shí)間成本、高效率的特點(diǎn)，但它的聚類質(zhì)量不是很好；密度的聚類算法可以將任意具有相異外形的簇進(jìn)行聚類，但它在處理高維空間數(shù)據(jù)的時(shí)間復(fù)雜度高。由于這兩者的互補(bǔ)關(guān)系，基于網(wǎng)格密度結(jié)合的策略進(jìn)行樣本空間的區(qū)分，能夠極大的提高聚類效率。本文聚類算法的思想是：首先，創(chuàng)建網(wǎng)格，對(duì)數(shù)據(jù)空間進(jìn)行初始網(wǎng)格劃分。其次，樣本空間劃分，根據(jù)得到的網(wǎng)格密度閥值，將網(wǎng)格單元的數(shù)據(jù)劃分成高、低密度區(qū)兩部分；將高密度區(qū)所有網(wǎng)格按照密度大小進(jìn)行排列，找到密度最大的網(wǎng)格，利用其周圍最近低密度網(wǎng)格區(qū)尋找到第一個(gè)高密度簇；將第一個(gè)高密度簇的點(diǎn)去掉，將剩余高密度網(wǎng)格進(jìn)行排序，依次進(jìn)行，直到形成最終空間的劃分結(jié)果。最后，計(jì)算各子簇類重心，將臨近簇重心空間合并，形成新簇重心，依次合并空間，直到等于給定簇類數(shù)，形成最終聚類結(jié)果。本文首先從理論方面對(duì)該算法進(jìn)行了描述，驗(yàn)證了該算法設(shè)計(jì)的合理性和科學(xué)性。最后通過Matlab隨機(jī)生成幾組數(shù)據(jù)進(jìn)行了實(shí)證分析，驗(yàn)證了本算法能夠在與經(jīng)典的K-means算法組間離差平方和相差不大的條件下，運(yùn)算時(shí)間上有了顯著的改善。
[Abstract]:Clustering analysis is an important part of data mining algorithm and an analysis activity in data mining. Clustering algorithm is the core of overall clustering analysis, which determines the quality of all the results of clustering analysis. How to further improve the clustering efficiency and reduce the cost and burden of users under the premise of ensuring the stability and effectiveness of the algorithm has become a very meaningful research direction. Because the traditional clustering algorithm requires high computer hardware resources, the clustering time of mass data is relatively long. In this paper, a new clustering algorithm based on grid and density is proposed. Generally, the clustering based on grid has the characteristics of saving time cost and high efficiency, but its clustering quality is not very good. The density clustering algorithm can cluster any cluster with different shapes, but it has a high time complexity in processing high-dimensional spatial data. Because of the complementary relationship between the two, the sample space is distinguished based on the combination of grid density. The idea of clustering algorithm in this paper is: firstly, to create grid, to divide the data space into the initial grid, secondly, to divide the sample space, according to the grid density threshold, The data of the grid cells are divided into high and low density areas, and all the grids in the high density region are arranged according to the density to find the most dense grid, and the first high density cluster is found by using the nearest low density grid area around the grid. The point of the first high density cluster is removed, the remaining high density grid is sorted, and then the final space is obtained. Finally, the center of gravity of each subcluster is calculated, and the adjacent center of gravity space is merged to form a new cluster center of gravity. The space is merged in turn until it is equal to a given number of clusters, and the final clustering result is obtained. Firstly, this paper describes the algorithm from the theoretical aspect, and verifies the rationality and scientificity of the algorithm design. Finally, several groups of data are generated randomly by Matlab for empirical analysis. It is verified that the algorithm can significantly improve the operation time under the condition that the sum of squared difference between the two groups is not different from that of the classical K-means algorithm.
【學(xué)位授予單位】：西安財(cái)經(jīng)學(xué)院
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2014
【分類號(hào)】：C81

【參考文獻(xiàn)】

相關(guān)期刊論文前10條

1 韓家煒,孟小峰,王靜,李盛恩;Web挖掘研究[J];計(jì)算機(jī)研究與發(fā)展;2001年04期

2 岳士弘,王正友;二分網(wǎng)格聚類方法及有效性[J];計(jì)算機(jī)研究與發(fā)展;2005年09期

3 胡亮;任維武;任斐;劉曉博;金剛;;基于改進(jìn)密度聚類的異常檢測(cè)算法[J];吉林大學(xué)學(xué)報(bào)(理學(xué)版);2009年05期

4 胡文瑜,孫志揮,周曉云;基于最優(yōu)K相異性的密度聚類算法研究[J];計(jì)算機(jī)工程與應(yīng)用;2005年22期

5 孟海東;宋飛燕;郝永寬;;基于密度與劃分方法的聚類算法設(shè)計(jì)與實(shí)現(xiàn)[J];計(jì)算機(jī)工程與應(yīng)用;2007年27期

6 李星毅;包從劍;施化吉;奚春海;;基于加權(quán)快速聚類的異常數(shù)據(jù)挖掘算法[J];計(jì)算機(jī)工程與應(yīng)用;2007年35期

7 趙衛(wèi)中;馬慧芳;傅燕翔;史忠植;;基于云計(jì)算平臺(tái)Hadoop的并行k-means聚類算法設(shè)計(jì)研究[J];計(jì)算機(jī)科學(xué);2011年10期

8 胡吉祥;許洪波;劉悅;程學(xué)旗;;重復(fù)串特征提取算法及其在文本聚類中的應(yīng)用[J];計(jì)算機(jī)工程;2007年02期

9 張玉芳,毛嘉莉,熊忠陽;一種改進(jìn)的K-means算法[J];計(jì)算機(jī)應(yīng)用;2003年08期

10 鄭洪英;倪霖;肖迪;;大規(guī)模數(shù)據(jù)集聚類中的數(shù)據(jù)分區(qū)及應(yīng)用研究[J];計(jì)算機(jī)應(yīng)用研究;2007年02期

本文編號(hào)：1633868

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/shekelunwen/shgj/1633868.html

上一篇：流動(dòng)老人健康差異的實(shí)證研究
下一篇：大數(shù)據(jù)融入人文社科研究的基本問題

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級(jí)|國家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于網(wǎng)格密度區(qū)分的多維聚類挖掘算法設(shè)計(jì)