天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于密度聚類(lèi)算法的研究與改進(jìn)

發(fā)布時(shí)間:2018-05-12 08:12

  本文選題:聚類(lèi) + 密度峰值 ; 參考:《內(nèi)蒙古大學(xué)》2017年碩士論文


【摘要】:聚類(lèi)分析,是一種在沒(méi)有任何先驗(yàn)知識(shí)的情況下對(duì)待聚類(lèi)數(shù)據(jù)根據(jù)數(shù)據(jù)間的相似性來(lái)進(jìn)行分類(lèi)的一種技術(shù),在模式識(shí)別中被稱(chēng)為無(wú)監(jiān)督分類(lèi),在統(tǒng)計(jì)學(xué)中被稱(chēng)為非參數(shù)估計(jì)。聚類(lèi)分析被廣泛地應(yīng)用于眾多學(xué)術(shù)領(lǐng)域,比如生物信息學(xué)、信息安全、文本聚類(lèi)等。在過(guò)去發(fā)展的幾十年,數(shù)以千計(jì)的聚類(lèi)算法被不同學(xué)者提出,但是仍存在很大的研究空間,例如如何處理不同形狀及密度的簇,對(duì)高維數(shù)據(jù)的合理計(jì)算,如何有效測(cè)定聚類(lèi)結(jié)果當(dāng)中簇的數(shù)量,噪聲點(diǎn)的合理檢測(cè)及如何定義及評(píng)判一個(gè)正確的簇等等。Alex Rodriguez與Alessandro Laio在2014年提出了一種新的啟發(fā)式聚類(lèi)算法 CFSFDP(Clustering by Fast Search and Find of Density Peaks)。該算法具有初始參數(shù)少、執(zhí)行速度快、可有效探測(cè)目標(biāo)簇?cái)?shù)目及對(duì)噪聲數(shù)據(jù)不敏感的特點(diǎn),本文通過(guò)一系列實(shí)驗(yàn)證明了該算法的有效性,并且該算法提出者利用Olivetti人臉數(shù)據(jù)庫(kù)中的圖片聚類(lèi)來(lái)證明該算法可以處理高維度數(shù)據(jù)。然而通過(guò)學(xué)習(xí)研究發(fā)現(xiàn),該算法在遇到某些情況時(shí)表現(xiàn)不好。首先,該算法的初始簇中心的選取需要依靠人工選定且對(duì)處于密度稀疏區(qū)域的簇中心無(wú)法有效提取。其次,該算法認(rèn)定數(shù)據(jù)集中的每個(gè)簇有且僅有一個(gè)局部密度值極點(diǎn),這將導(dǎo)致?lián)碛卸嗝芏葮O值點(diǎn)的簇及共享密度極值點(diǎn)的簇被錯(cuò)誤劃分。再者,該算法對(duì)噪聲點(diǎn)的識(shí)別方法會(huì)致使較多的數(shù)據(jù)點(diǎn)被判定為噪聲;谶@些發(fā)現(xiàn),本文提出一種新的基于密度峰值的算法,改進(jìn)算法通過(guò)改進(jìn)的決策值計(jì)算方法來(lái)構(gòu)建決策圖,通過(guò)發(fā)現(xiàn)決策圖拐點(diǎn)來(lái)自動(dòng)識(shí)別簇中心。然后通過(guò)加入構(gòu)建子簇的局部密度分布圖的操作以及改進(jìn)的層次聚類(lèi)算法思想對(duì)錯(cuò)誤劃分的子簇進(jìn)行分割和合并,最后通過(guò)新引入的數(shù)據(jù)點(diǎn)離群度計(jì)算公式來(lái)識(shí)別噪聲。通過(guò)實(shí)驗(yàn)表明,該改進(jìn)算法在多個(gè)數(shù)據(jù)集上的聚類(lèi)效果優(yōu)于原有的算法及其他基于密度的聚類(lèi)算法。
[Abstract]:Clustering analysis is a technique to classify clustering data according to the similarity of data without any prior knowledge. It is called unsupervised classification in pattern recognition and nonparametric estimation in statistics. Clustering analysis is widely used in many academic fields, such as bioinformatics, information security, text clustering and so on. In the past decades, thousands of clustering algorithms have been proposed by different scholars, but there is still a lot of research space, such as how to deal with clusters with different shapes and densities, and how to calculate the high-dimensional data reasonably. In 2014, Alex Rodriguez and Alessandro Laio proposed a new heuristic clustering algorithm, CFSFDP(Clustering by Fast Search and Find of Density Peaks, how to effectively determine the number of clusters in clustering results, how to reasonably detect noise points and how to define and judge a correct cluster. The algorithm has the advantages of less initial parameters, fast execution speed, effective detection of the number of target clusters and insensitivity to noise data. The effectiveness of the algorithm is proved by a series of experiments in this paper. The proposed algorithm uses image clustering in Olivetti face database to prove that the algorithm can deal with high dimensional data. However, it is found that the algorithm does not perform well in some cases. Firstly, the selection of initial cluster centers depends on manual selection and can not be effectively extracted from clusters located in dense sparse regions. Secondly, the algorithm determines that each cluster in the dataset has only one local density extremum, which leads to the misdivision of clusters with multi-density extremum points and clusters with shared density extremum points. Furthermore, more data points are judged as noise by the method of noise recognition. Based on these findings, this paper proposes a new algorithm based on the peak density. The improved algorithm constructs the decision graph by the improved method of calculating the decision value, and automatically identifies the cluster center by finding the inflection point of the decision graph. Then the sub-clusters are segmented and merged by adding the operation of constructing the local density distribution map of the subclusters and the idea of improved hierarchical clustering algorithm. Finally, the noise is identified by the newly introduced formula for calculating the outliers of data points. The experimental results show that the improved algorithm is superior to the original algorithm and other density-based clustering algorithms in clustering performance on multiple datasets.
【學(xué)位授予單位】:內(nèi)蒙古大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類(lèi)號(hào)】:TP311.13

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 蔣禮青;張明新;鄭金龍;戴嬌;尚趙偉;;快速搜索與發(fā)現(xiàn)密度峰值聚類(lèi)算法的優(yōu)化研究[J];計(jì)算機(jī)應(yīng)用研究;2016年11期

2 謝明霞;郭建忠;張海波;陳科;;高維數(shù)據(jù)相似性度量方法研究[J];計(jì)算機(jī)工程與科學(xué);2010年05期

3 王晶;夏魯寧;荊繼武;;一種基于密度最大值的聚類(lèi)算法[J];中國(guó)科學(xué)院研究生院學(xué)報(bào);2009年04期

4 周董;劉鵬;;VDBSCAN:變密度聚類(lèi)算法[J];計(jì)算機(jī)工程與應(yīng)用;2009年11期

5 曾依靈;許洪波;白碩;;改進(jìn)的OPTICS算法及其在文本聚類(lèi)中的應(yīng)用[J];中文信息學(xué)報(bào);2008年01期

6 程世輝;盧翠英;;算法的時(shí)間復(fù)雜度分析[J];河南教育學(xué)院學(xué)報(bào)(自然科學(xué)版);2007年04期

7 薛安榮;鞠時(shí)光;何偉華;陳偉鶴;;局部離群點(diǎn)挖掘算法研究[J];計(jì)算機(jī)學(xué)報(bào);2007年08期

8 賀玲;吳玲達(dá);蔡益朝;;數(shù)據(jù)挖掘中的聚類(lèi)算法綜述[J];計(jì)算機(jī)應(yīng)用研究;2007年01期

9 蔡穎琨,謝昆青,馬修軍;屏蔽了輸入?yún)?shù)敏感性的DBSCAN改進(jìn)算法[J];北京大學(xué)學(xué)報(bào)(自然科學(xué)版);2004年03期

10 周水庚,周傲英,曹晶;基于數(shù)據(jù)分區(qū)的DBSCAN算法[J];計(jì)算機(jī)研究與發(fā)展;2000年10期

相關(guān)博士學(xué)位論文 前2條

1 楊茂林;離群檢測(cè)算法研究[D];華中科技大學(xué);2012年

2 薛安榮;空間離群點(diǎn)挖掘技術(shù)的研究[D];江蘇大學(xué);2008年

相關(guān)碩士學(xué)位論文 前2條

1 張文開(kāi);基于密度的層次聚類(lèi)算法研究[D];中國(guó)科學(xué)技術(shù)大學(xué);2015年

2 易星;半監(jiān)督學(xué)習(xí)若干問(wèn)題的研究[D];清華大學(xué);2004年

,

本文編號(hào):1877838

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/shoufeilunwen/xixikjs/1877838.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶(hù)221bd***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com