天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當前位置:主頁 > 科技論文 > 軟件論文 >

改進的近鄰傳播聚類算法及其應用研究

發(fā)布時間:2018-06-12 17:18

  本文選題:近鄰傳播聚類 + 加權(quán)馬氏距離; 參考:《南京理工大學》2017年碩士論文


【摘要】:聚類分析是多元統(tǒng)計分析的一個重要組成部分,廣泛應用于社會生活的各個領(lǐng)域。近鄰傳播聚類算法是一種新型無監(jiān)督聚類算法,由Frey和Dueck于2007年提出。該算法不需要給定初始聚類中心和簇的數(shù)量,只要構(gòu)造相似度矩陣,建立偏向參數(shù),即可通過消息傳遞機制,自動確定適合的類代表點。初步研究表明該算法具有許多優(yōu)良的性質(zhì),如運算速度快、誤差平方和小、聚類精度高等,但也有不足之處。首先,AP算法選擇負的歐式距離作為其相似度度量,但歐式距離只適用于樣本相互獨立的情況,易受量綱的影響,且認為每個屬性對距離的重要性相同。本文提出基于均方差的加權(quán)馬氏距離,再將此加權(quán)馬氏距離的負數(shù)作為AP算法的相似度度量,馬氏距離能夠自適應地調(diào)整數(shù)據(jù)的幾何分布,消除屬性之間相關(guān)性的干擾,基于均方差給屬性加權(quán),又綜合考慮了屬性相對重要程度對最終聚類的影響。采用此相似度度量,不僅擴大了算法的應用范圍,也使聚類結(jié)果更精確。其次,AP算法中將每個點的偏向參數(shù)P設置為相同的值,即默認全部樣本點成為類代表的可能性相等,忽略了數(shù)據(jù)分布特性對某點能否成為類代表的影響。針對此缺陷,本文提出基于其它所有點到某點的隸屬度之和越大則該點成為類代表可能性越大的假設來設置P,實現(xiàn)了不同的點賦予不同的P值;跀(shù)據(jù)特性設置P值,即事先給成為類代表可能性大的點賦予更高的P值,減少算法迭代次數(shù)及運行時間。同時,本文基于柯西收斂準則,實證分析了模型中歸屬度矩陣及吸引度矩陣的收斂性。最后,為獲得從1到k的k個聚類,提出自適應步長,動態(tài)調(diào)整P值進行聚類的方法,并在此基礎(chǔ)上研究了P值與聚類數(shù)目的關(guān)系,進一步對模型進行了優(yōu)化。并利用Gap指標估計出最佳聚類數(shù)。通過對UCI數(shù)據(jù)庫中的一些數(shù)據(jù)集進行仿真實驗,證明了該模型具有可行性和優(yōu)越性。
[Abstract]:Cluster analysis is an important part of multivariate statistical analysis, which is widely used in various fields of social life. The nearest neighbor propagation clustering algorithm is a new unsupervised clustering algorithm proposed by Frey and Dueck in 2007. The algorithm does not need to give the initial cluster center and the number of clusters. As long as the similarity matrix is constructed and the bias parameters are established, the appropriate representative points of the class can be automatically determined by the message passing mechanism. The preliminary research shows that the algorithm has many excellent properties, such as fast operation speed, small sum of error square and high clustering accuracy, but it also has some shortcomings. First of all, the AP algorithm chooses negative Euclidean distance as its similarity measure, but Euclidean distance is only suitable for the case where samples are independent of each other, so it is easy to be affected by dimensionality, and the importance of each attribute to distance is considered to be the same. In this paper, weighted Markov distance based on mean-variance is proposed, and the negative number of weighted Markov distance is used as the similarity measure of AP algorithm. Markov distance can adaptively adjust the geometric distribution of data and eliminate the interference of correlation between attributes. The influence of relative importance of attributes on final clustering is considered synthetically based on weighted attributes based on mean square error (RMS). This similarity measure not only expands the application scope of the algorithm, but also makes the clustering result more accurate. Secondly, in the AP algorithm, the bias parameter P of each point is set to the same value, that is, the probability that all sample points become class representative by default is equal, and the influence of data distribution on whether a point can represent a class is ignored. In order to solve this problem, this paper proposes the assumption that the greater the sum of membership degrees from all other points to a certain point, the greater the possibility that the point becomes a class representative, and the different points assign different P values. Setting P value based on the data characteristic, that is to say, it gives higher P value to the point which is more likely to represent the class, and reduces the iteration times and running time of the algorithm. At the same time, based on Cauchy convergence criterion, the convergence of attribution matrix and attraction matrix in the model is analyzed empirically. Finally, in order to obtain k clusters from 1 to k, an adaptive step size method is proposed to dynamically adjust P value to cluster. Based on this, the relationship between P value and the number of clusters is studied, and the model is further optimized. The best clustering number is estimated by Gap index. Simulation experiments on some data sets in UCI database show that the model is feasible and superior.
【學位授予單位】:南京理工大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP311.13

【參考文獻】

相關(guān)期刊論文 前6條

1 邢艷;周勇;;基于互近鄰一致性的近鄰傳播算法[J];計算機應用研究;2012年07期

2 付迎丁;蘭巨龍;;基于核自適應的近鄰傳播聚類算法[J];計算機應用研究;2012年05期

3 周世兵;徐振源;唐旭清;;基于近鄰傳播算法的最佳聚類數(shù)確定方法比較研究[J];計算機科學;2011年02期

4 谷瑞軍;汪加才;陳耿;陳圣磊;;面向大規(guī)模數(shù)據(jù)集的近鄰傳播聚類[J];計算機工程;2010年23期

5 董俊;王鎖萍;熊范綸;;可變相似性度量的近鄰傳播聚類[J];電子與信息學報;2010年03期

6 王開軍;張軍英;李丹;張新娜;郭濤;;自適應仿射傳播聚類[J];自動化學報;2007年12期

,

本文編號:2010441

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2010441.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶21ecb***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com
人人爽夜夜爽夜夜爽精品视频| 国产精品蜜桃久久一区二区| 国产一区二区三区成人精品| 欧美一级黄片欧美精品| 国产欧美一区二区久久| 亚洲伦片免费偷拍一区| 免费国产成人性生活生活片| 国产原创中文av在线播放| 亚洲中文字幕亲近伦片| 中文字幕亚洲精品在线播放| 亚洲一区二区三区三区| 日韩一区二区三区观看| 国产乱淫av一区二区三区| 我要看日本黄色小视频| 久久久免费精品人妻一区二区三区| 91人妻人人澡人人人人精品| 亚洲午夜福利视频在线| 亚洲高清欧美中文字幕| 99国产高清不卡视频| 五月婷婷缴情七月丁香| 欧美韩国日本精品在线| 好吊日在线观看免费视频| 精品丝袜一区二区三区性色| 亚洲色图欧美另类人妻| 日韩一级免费中文字幕视频| 亚洲国产成人av毛片国产| 东京干男人都知道的天堂| 成人精品一级特黄大片| 太香蕉久久国产精品视频| 在线观看那种视频你懂的| 午夜久久久精品国产精品| 中文字幕一区二区久久综合| 精产国品一二三区麻豆| 亚洲午夜av一区二区| 黄色美女日本的美女日人| 深夜视频成人在线观看| 厕所偷拍一区二区三区视频| 亚洲免费黄色高清在线观看| 日本成人三级在线播放| 日韩1区二区三区麻豆| 婷婷亚洲综合五月天麻豆 |