分類器選擇集成及在基因數(shù)據(jù)分析中的應(yīng)用
發(fā)布時間:2018-06-03 04:08
本文選題:集成學習 + 選擇性集成。 參考:《大連理工大學》2016年碩士論文
【摘要】:集成學習方法是指對同一個問題,集成不同分類器的分類結(jié)果,以得到更好的分類性能。但并非集成學習方法中的每一個分類器對集成結(jié)果都是有效的,選擇性集成就是嘗試選擇出效果較好的分類器子集,提高整體的性能,并且減少集成的內(nèi)存需求和計算花費。本文提出了兩種選擇性集成方法,一種是基于kappa系數(shù)的靜態(tài)選擇性集成,另一種是基于螢火蟲算法的動態(tài)選擇性集成。其中,靜態(tài)選擇方法適用于數(shù)據(jù)量較小的情況,動態(tài)選擇適用于數(shù)據(jù)量較大的情況。在進行分類器選擇之前,先通過基于數(shù)據(jù)擾動的排序聚合算法選擇出與分類相關(guān)的基因,通過近鄰傳播聚類算法對基因進行分組,通過隨機的從每組中選擇一個基因用于構(gòu)成最后的基因子集。這樣得到的基因子集既與分類相關(guān),基因之間的關(guān)聯(lián)性也不高。在得到基分類器之后,第一種方法是利用kappa閾值篩選出大于閾值的分類器,第二種方法是利用類似于聚類的方法選擇出精確度較高的且相互之間的差異也較大的分類器。在5個基因數(shù)據(jù)集上的實驗結(jié)果表明,本文的兩種方法的精確度高于對比方法和經(jīng)典方法。第一種方法在數(shù)據(jù)量較小時能夠快速的選擇出合適的分類器子集,第二種方法在數(shù)據(jù)量較大的時候能夠節(jié)省更多的時間,并且也能夠獲得較好的分類結(jié)果。
[Abstract]:The ensemble learning method is to integrate the classification results of different classifiers for the same problem in order to obtain better classification performance. However, not every classifier in the ensemble learning method is effective for the ensemble result. Selective ensemble is to try to select a subset of classifiers with better performance, improve the overall performance, and reduce the memory requirement and computational cost of the integration. In this paper, two selective ensemble methods are proposed, one is static selective integration based on kappa coefficient, the other is dynamic selective integration based on firefly algorithm. Among them, the static selection method is suitable for the case of small amount of data, and the dynamic selection method is suitable for the case of large amount of data. Before classifier selection, the genes related to classification are selected by the sort aggregation algorithm based on data disturbance, and the genes are grouped by the nearest neighbor propagation clustering algorithm. One gene was randomly selected from each group to form the final subset of genes. The resulting subsets of genes are not highly correlated with classification, nor are they highly correlated with each other. After getting the base classifier, the first method is to use the kappa threshold to filter out the classifier larger than the threshold. The second method is to select the classifier with higher accuracy and greater difference from each other by using a similar clustering method. The experimental results on five gene datasets show that the accuracy of the two methods is higher than that of the contrast method and the classical method. The first method can quickly select the appropriate classifier subset when the amount of data is small. The second method can save more time and obtain better classification results when the amount of data is large.
【學位授予單位】:大連理工大學
【學位級別】:碩士
【學位授予年份】:2016
【分類號】:TP18
【參考文獻】
相關(guān)期刊論文 前2條
1 黎成;;新型元啟發(fā)式蝙蝠算法[J];電腦知識與技術(shù);2010年23期
2 傅強,胡上序,趙勝穎;Clustering-based selective neural network ensemble[J];Journal of Zhejiang University Science A(Science in Engineering);2005年05期
,本文編號:1971410
本文鏈接:http://sikaile.net/kejilunwen/jiyingongcheng/1971410.html
最近更新
教材專著