一種基于密度的分布式聚類方法

發(fā)布時間：2018-12-12 08:55

【摘要】：聚類是數(shù)據(jù)挖掘領域中的一種重要的數(shù)據(jù)分析方法.它根據(jù)數(shù)據(jù)間的相似度,將無標注數(shù)據(jù)劃分為若干聚簇.CSDP是一種基于密度的聚類算法,當數(shù)據(jù)量較大或數(shù)據(jù)維數(shù)較高時,聚類的效率相對較低.為了提高聚類算法的效率,提出了一種基于密度的分布式聚類方法 MRCSDP,利用MapReduce框架對實驗數(shù)據(jù)進行聚類.該方法定義了獨立計算單元和獨立計算塊的概念.首先,將數(shù)據(jù)拆分為若干數(shù)據(jù)塊,構建獨立計算單元和獨立計算塊,在集群中分配獨立計算塊的任務;然后進行分布式計算,得到數(shù)據(jù)塊的局部密度,將局部密度合并得到全局密度,根據(jù)全局密度計算中心值,由全局密度和中心值得到每個數(shù)據(jù)塊中候選聚簇中心;最后,從候選聚簇中心選舉出最終的聚簇中心.MRCSDP在充分降低時間復雜度的基礎上得到較好的聚類效果.實驗結果表明,分布式環(huán)境下的聚類方法MRCSDP相對于CSDP更能快速、有效地處理大規(guī)模數(shù)據(jù),并使各節(jié)點負載均衡.
[Abstract]:Clustering is an important data analysis method in the field of data mining. CSDP is a density-based clustering algorithm, and the clustering efficiency is relatively low when the amount of data is large or the dimension of data is high. In order to improve the efficiency of the clustering algorithm, a density based distributed clustering method, MRCSDP, is proposed to cluster experimental data using the MapReduce framework. This method defines the concepts of independent computing unit and independent computing block. Firstly, the data is divided into several data blocks, the independent computing unit and the independent computing block are constructed, and the task of the independent computing block is assigned in the cluster. Then the local density of the data block is obtained by distributed computation, and the global density is combined to get the global density. According to the global density, the global density and center are worth to the candidate cluster center in each data block. Finally, the final cluster center is selected from the candidate cluster center. MRCSDP can get better clustering effect on the basis of fully reducing the time complexity. The experimental results show that the clustering method MRCSDP in distributed environment can deal with large scale data more quickly and effectively than CSDP and make each node load balance.
【作者單位】：吉林大學計算機科學與技術學院;吉林大學符號計算與知識工程教育部重點實驗室;
【分類號】：TP311.13

【相似文獻】

相關會議論文前1條

1 任瑞瑞;蔡正敏;楊菊生;;導向隨鉆測量儀在扭-壓荷載下的強度校核[A];第14屆全國結構工程學術會議論文集（第三冊）[C];2005年

相關重要報紙文章前1條

1 郭見冽;“分離”計算惹人盼[N];計算機世界;2002年

，

本文編號：2374295

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2374295.html

上一篇：基于微博的細粒度情感分析
下一篇：智慧校車監(jiān)管系統(tǒng)設計

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

一種基于密度的分布式聚類方法