基于密度的聚類集成
發(fā)布時間:2018-04-21 13:29
本文選題:聚類集成 + 半監(jiān)督聚類集成; 參考:《西南交通大學(xué)》2017年碩士論文
【摘要】:伴隨互聯(lián)網(wǎng)技術(shù)的迅猛進(jìn)步,社會步入了大數(shù)據(jù)時代。人類日常生活中產(chǎn)生了大量的數(shù)據(jù)。未來,無論是在哪個領(lǐng)域,越來越多的決策將依賴于數(shù)據(jù)分析。如何合理高效地分析大量數(shù)據(jù),找到數(shù)據(jù)背后有價值的信息,成為新的關(guān)注點。聚類集成融合了聚類以及集成學(xué)習(xí)兩種技術(shù)。使用此類模型處理問題能夠提升最終結(jié)果的準(zhǔn)確率、魯棒性以及穩(wěn)定性。集成過程中,增加半監(jiān)督信息,可以得到新的模型:半監(jiān)督聚類集成模型。在某些特定條件下,此模型獲得的聚類結(jié)果可能會優(yōu)于無監(jiān)督聚類集成模型。本文選擇近鄰傳播(Affinity propogation,AP)算法作為基聚類器,在實驗過程中,多次設(shè)置不同輸入?yún)?shù),以此來獲得不一致的基聚類結(jié)果。隨后使用改進(jìn)的最大信息系數(shù)(Rapid computation of the maximal information coefficient,RapidMic)計算各基聚類結(jié)果之間的相關(guān)性,用相似性矩陣表示。選取此矩陣來展示樣本數(shù)據(jù)集的密度關(guān)系。本文借助等距映射(Isometric feature mapping,Isomap)進(jìn)行降維,例證樣本數(shù)據(jù)的密度關(guān)系能夠透過基聚類結(jié)果來揭示。通過對密度峰值(Density peaks,DP)算法進(jìn)行改進(jìn),本文設(shè)計出可以自動選取擁有較大密度峰值的幾個點作為聚類中心的k_DP算法。然后基于此,得到一種新的聚類集成算法KDPE。實驗表明,相較幾種經(jīng)典模型,KDPE能夠獲得更好的聚類集成效果。最后,本文嘗試將半監(jiān)督信息加入新模型,試圖通過這種方式改善聚類集成效果。改進(jìn)DP得到semi_DP后,依賴于這一新的方法,設(shè)計得到一種新的半監(jiān)督聚類集成算法SDPE。通過對比實驗,發(fā)現(xiàn)在某些特定的半監(jiān)督比例下,SDPE可以優(yōu)化聚類效果,一定程度上提升KDPE的表現(xiàn)。
[Abstract]:With the rapid progress of Internet technology, the society has stepped into the era of big data. A great deal of data is produced in human daily life. In the future, no matter where, more and more decisions will depend on data analysis. How to analyze a large amount of data reasonably and efficiently and find valuable information behind the data has become a new concern. Clustering integration combines two technologies: clustering and integrated learning. Using this model to deal with problems can improve the accuracy, robustness and stability of the final results. In the process of integration, a new model, semi-supervised clustering integration model, can be obtained by adding semi-supervised information. Under some special conditions, the clustering results obtained by this model may be superior to those of unsupervised clustering ensemble model. In this paper, we choose the Affinity propoation (AP) algorithm as a base clustering device. In the experiment, different input parameters are set many times to obtain inconsistent clustering results. Then the correlation between the results of each base clustering was calculated by using the improved computation of the maximal information coefficient Rapid Mici, which is expressed by similarity matrix. Select this matrix to show the density relation of the sample data set. In this paper, dimension reduction is carried out by means of isometric feature mapping. The density relationship of sample data can be revealed by the result of base clustering. By improving the Density peaks-dpp algorithm, this paper designs a k_DP algorithm which can automatically select several points with high density peak as the clustering center. Based on this, a new clustering integration algorithm, KDPE, is proposed. The experimental results show that KDPE can obtain better clustering integration effect than several classical models. Finally, this paper attempts to add semi-supervised information to the new model and try to improve the clustering integration effect by this way. After improving DP to get semi_DP, a new semi-supervised clustering algorithm SDPe is designed based on this new method. Through comparative experiments, it is found that the clustering effect can be optimized under certain semi-supervised ratios, and the performance of KDPE can be improved to a certain extent.
【學(xué)位授予單位】:西南交通大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP311.13
【相似文獻(xiàn)】
相關(guān)碩士學(xué)位論文 前1條
1 褚睿鴻;基于密度的聚類集成[D];西南交通大學(xué);2017年
,本文編號:1782635
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1782635.html
最近更新
教材專著