基于正態(tài)分布的密度峰聚類算法的研究

發(fā)布時間：2018-09-11 07:48

【摘要】：聚類算法是一種根據(jù)相似特征將數(shù)據(jù)集分為幾個類別的重要機器學習算法。聚類分析廣泛應用于機器學習,模式識別,生物信息學和圖像處理。2014年,Alex Rodriguez等人在《Science》上提出了一種新的基于密度的密度峰聚類(clustering by fast search and find of density peaks,DPC)算法。該算法借助了數(shù)據(jù)點的密度以及其到具有更高密度點的距離這兩個特征來發(fā)現(xiàn)潛在的簇心。密度峰聚類算法簡潔明了,能一步得到聚類結(jié)果,且聚類效果較佳。但是該算法在聚類過程中需要人為參與分析決策圖并選取潛在的簇心,這降低了算法的效率。為了實現(xiàn)自動聚類的目的,本文針對各個點在決策圖上的特點,提出了采用密度與距離的乘機Z為新的判斷指標來選擇潛在的簇心并采用概率統(tǒng)計的方法來篩選簇心的方法。由于只有潛在的簇心具有較高的密度與較大的距離,因此它們的Z值遠遠大于非簇心點。假設Z的分布是正態(tài)分布,因此可以借助概率統(tǒng)計的方法來確定一個上界。超過該上界的值所對應的點將自動被視為簇心點。實驗結(jié)果表明,采用正態(tài)分布這樣概率統(tǒng)計方法能正確識別出潛在的簇心點,且該方法選取簇心的方式與人為分析決策圖選取潛在簇心的方法相似,與其他優(yōu)秀的聚類算法相比,基于正態(tài)分布的密度峰聚類算法在應對不同形狀的數(shù)據(jù)集的方面具有更優(yōu)秀的性能,能得到較好的聚類結(jié)果。
[Abstract]:Clustering algorithm is an important machine learning algorithm which divides data sets into several categories according to similarity characteristics. Clustering analysis is widely used in machine learning, pattern recognition, bioinformatics and image processing. In 2014, Alex Rodriguez et al proposed a new density-based density peak clustering (clustering by fast search and find of density peaks,DPC) algorithm on < Science >. The algorithm uses the density of data points and the distance between the data points and the higher density points to find the potential cluster centers. The density peak clustering algorithm is simple and clear, and the clustering results can be obtained in one step, and the clustering effect is better. But in the process of clustering, the algorithm needs to participate in the analysis of decision graph and select the potential cluster core, which reduces the efficiency of the algorithm. In order to achieve the purpose of automatic clustering, this paper presents a method of selecting potential cluster centers by using the multiplier Z of density and distance as a new judgement index and selecting cluster centers by probability and statistics according to the characteristics of each point in the decision graph. Because only the potential cluster centers have higher density and longer distance, their Z value is much larger than that of non-cluster centers. Assuming that the distribution of Z is a normal distribution, an upper bound can be determined by the method of probability and statistics. The point corresponding to the value above the upper bound will automatically be regarded as the cluster center point. The experimental results show that the probabilistic statistical method such as normal distribution can correctly identify the potential cluster center points, and the method is similar to the method of selecting the potential cluster center in the artificial analysis decision map. Compared with other excellent clustering algorithms, the density peak clustering algorithm based on normal distribution has better performance in dealing with different shape data sets, and can obtain better clustering results.
【學位授予單位】：浙江工業(yè)大學
【學位級別】：碩士
【學位授予年份】：2016
【分類號】：TP311.13

【相似文獻】

相關(guān)期刊論文前10條

1 陶勇劍;董德存;任鵬;;故障樹分析的二元決策圖方法[J];鐵路計算機應用;2009年09期

2 朱隨江;劉宇;劉寶旭;姜政偉;;基于二叉決策圖的網(wǎng)絡可達性計算[J];計算機工程與應用;2012年04期

3 邱建林;二叉決策圖在邏輯綜合中的應用[J];微機發(fā)展;2002年01期

4 ;數(shù)理科學與基礎(chǔ)理論[J];電子科技文摘;2001年03期

5 何明;權(quán)冀川;鄭翔;賴海光;楊飛;;基于二元決策圖的網(wǎng)絡可靠性評估[J];控制與決策;2011年01期

6 李道豐;張增芳;;基于有序二叉決策圖的路徑規(guī)劃可行性研究[J];計算機工程與設計;2008年22期

7 紀明宇;王海濤;陳志遠;李艷梅;;基于決策圖的復雜系統(tǒng)模型對稱約減方法[J];計算機工程與設計;2013年10期

8 孫艷蕊,張祥德;利用二分決策圖計算網(wǎng)絡可靠度的一個有效算法[J];東北大學學報;1998年05期

9 王波,邱建林,管致錦;集成電路中布爾線路圖的優(yōu)化設計[J];南通工學院學報;2001年03期

10 姚金濤;劉財興;孔宇彥;;基于決策圖貝葉斯網(wǎng)絡的混沌優(yōu)化算法[J];系統(tǒng)仿真學報;2008年12期

相關(guān)會議論文前1條

1 郭紅仙;王際芝;;廊坊市計算機輔助減災決策圖文數(shù)據(jù)庫[A];第四屆全國結(jié)構(gòu)工程學術(shù)會議論文集（下）[C];1995年

相關(guān)博士學位論文前2條

1 李淑敏;決策圖擴展方法及其在重要度計算中的應用[D];西北工業(yè)大學;2014年

2 賴永;帶蘊含文字的有序二元決策圖[D];吉林大學;2013年

相關(guān)碩士學位論文前4條

1 吳丹丹;基于決策圖的高速公路網(wǎng)連通可靠性研究[D];浙江師范大學;2016年

2 鄭P;基于正態(tài)分布的密度峰聚類算法的研究[D];浙江工業(yè)大學;2016年

3 王樂;基于可能性決策圖的可能性規(guī)劃[D];東北師范大學;2011年

4 喬迪;光網(wǎng)絡可靠性評估模型和算法研究[D];北京郵電大學;2014年

，

本文編號：2236063

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2236063.html

上一篇：基于圖理論的圖像特征匹配算法研究
下一篇：Space Syntax與Arc GIS集成技術(shù)下的商業(yè)體內(nèi)行人軌跡預測

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于正態(tài)分布的密度峰聚類算法的研究