影響投影尋蹤聚類建模的關(guān)鍵因素分析與實(shí)證研究
發(fā)布時間:2019-05-24 16:12
【摘要】:討論了由于對Friedman等提出的投影尋蹤聚類(PPC)建;舅枷氲睦斫獠煌岢龅牧N目標(biāo)函數(shù)的特點(diǎn)和區(qū)別,分析了樣本數(shù)據(jù)三種歸一化預(yù)處理方法的區(qū)別與聯(lián)系,闡述了四種取不同R值方案的本質(zhì)和內(nèi)涵。通過實(shí)證研究和理論分析發(fā)現(xiàn),目標(biāo)函數(shù)Q(a)=S_z*D_z不僅應(yīng)用最廣,且最能體現(xiàn)投影尋蹤的基本思想,目標(biāo)函數(shù)Q(a)=S_z+D_z存在大數(shù)吃小數(shù)的問題,目標(biāo)函數(shù)Q(a)=1/S_z+μ*D_z~*僅適用于高相似度的大樣本數(shù)據(jù)情況,但并沒有取得更好的效果,目標(biāo)函數(shù)Q(a)=S_z*C*E和Q(a)=S_z*D_z*E通過增加權(quán)重信息熵和樣本投影值信息熵,但并沒有取得更好的聚類效果,目標(biāo)函數(shù)Q(a)=S_z不符合PPC基本建模思想。樣本數(shù)據(jù)不同歸一化預(yù)處理方法對建模結(jié)果有顯著影響,極大值歸一化方法更能體現(xiàn)樣本數(shù)據(jù)的原始結(jié)構(gòu)特性,極差歸一化方法有利于弱化指標(biāo)之間的權(quán)重差異,去均值歸一化方法可以弱化異常值的影響。局部密度窗口半徑R值對建模結(jié)果有顯著影響,R取較小值(R≤0.1S_z)方案更有利于區(qū)分樣本,但不利于聚類,最優(yōu)化過程有時候無法求得真正的全局最優(yōu)解。R取較大值(2m≥R≥r_(max))方案的前提、推導(dǎo)過程和結(jié)果都是錯誤的。R=(r_(i,j))_((k))取值方案只有在類內(nèi)樣本之間距離的最大值小于類間樣本之間距離的最小值的特殊情況下才具有意義。R在r_(max)/5≤R≤r_(max)/3范圍內(nèi)取適度值的方案是合理的,也與Friedman等提出的選取R合理值的思想是一致的。
[Abstract]:This paper discusses the characteristics and differences of the six objective functions proposed because of the different understanding of the basic idea of projection pursuit clustering (PPC) modeling proposed by Friedman et al., and analyzes the differences and relations among the three normalization preprocessing methods of sample data. This paper expounds the essence and connotation of four different R value schemes. Through empirical research and theoretical analysis, it is found that the objective function Q (a) = S_z*D_z is not only the most widely used, but also the most able to reflect the basic idea of projection pursuit. The objective function Q (a) = S 鈮,
本文編號:2485000
[Abstract]:This paper discusses the characteristics and differences of the six objective functions proposed because of the different understanding of the basic idea of projection pursuit clustering (PPC) modeling proposed by Friedman et al., and analyzes the differences and relations among the three normalization preprocessing methods of sample data. This paper expounds the essence and connotation of four different R value schemes. Through empirical research and theoretical analysis, it is found that the objective function Q (a) = S_z*D_z is not only the most widely used, but also the most able to reflect the basic idea of projection pursuit. The objective function Q (a) = S 鈮,
本文編號:2485000
本文鏈接:http://sikaile.net/kejilunwen/yysx/2485000.html
最近更新
教材專著