面向非平衡類數(shù)據(jù)的分類器性能比較研究與方法改進(jìn)
[Abstract]:Class disequilibrium data widely exist in the real world. In some areas, the importance of a few samples being correctly classified is much higher than that of most classes. However, most classical classification algorithms assume that the prior probability distribution of samples is balanced or the cost of misclassification is equal. When dealing with the disequilibrium distribution data, the information of a few samples is often masked by the information of the majority samples, so the classification error rate of the minority samples is much higher than that of the majority samples. Therefore, more and more attention has been paid to the classification of unbalanced class data. Because of the serious skew in the number of samples in the unbalanced dataset or the uneven distribution, the traditional classification algorithm can directly deal with the non-equilibrium class dataset, which will result in the poor classification accuracy of a small number of samples. Therefore, using mixed sampling method to change the class distribution at the data level and improving the classification performance by an improved selective ensemble algorithm based on hybrid genetic algorithm can not only improve the classification performance, but also improve the classification performance. Moreover, the classification accuracy of a few classes can be improved. The main research work and achievements are as follows: (1) selecting base classifier. On the WEKA platform, the C4.5 decision tree BP neural network is compared and analyzed. The classification performance and stability of naive Bayes and support vector machines in balanced and unbalanced datasets. (2) the effect of selective integration on balanced and unbalanced datasets. With the help of WEKA platform, the classification accuracy of single classifier and integrated classifier on all data sets is compared and analyzed. Search for the combination of base classifiers with large lifting space in ensemble learning: verify the feasibility of selective integration by comparing the classification performance of non-balanced datasets in selective and non-selective ensemble experiments; It is proved that the non-equilibrium data sets need to be modified at the data level through the different performance of integrated classification on balanced and unbalanced data sets. (3) A comprehensive integration method based on unbalanced data classification problem is proposed. According to the distribution characteristics of class non-equilibrium data, a relatively balanced training set is constructed by combining top-sampling SMOTE and down-sampling Bootstrap, and then C4.5 decision tree based classifier is selected by hybrid genetic algorithm for ensemble learning. In order to improve the classification effect of a few classes in unbalanced data sets.
【學(xué)位授予單位】:大連海事大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP181
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 秦鋒;楊波;程澤凱;;分類器性能評價標(biāo)準(zhǔn)研究[J];計算機(jī)技術(shù)與發(fā)展;2006年10期
2 王麗麗;蘇德富;;基于群體智能的選擇性決策樹分類器集成[J];計算機(jī)技術(shù)與發(fā)展;2006年12期
3 楊曉霜;汪源源;;基于Moore-Penrose逆矩陣的選擇性集成[J];光電工程;2009年11期
4 王磊;;基于約束投影的支持向量機(jī)選擇性集成[J];計算機(jī)科學(xué);2009年10期
5 王成;劉亞峰;王新成;閆桂榮;;分類器的分類性能評價指標(biāo)[J];電子設(shè)計工程;2011年08期
6 呂卉;周聰;鄒娟;鄭金華;;基于多種群進(jìn)化的遺傳算法[J];計算機(jī)工程與應(yīng)用;2010年28期
7 李明方;張化祥;;針對不平衡數(shù)據(jù)集的Bagging改進(jìn)算法[J];計算機(jī)工程與應(yīng)用;2010年30期
8 倪黃晶;王蔚;;多類不平衡數(shù)據(jù)上的分類器性能比較研究[J];計算機(jī)工程;2011年10期
9 錢洪波;賀廣南;;非平衡類數(shù)據(jù)分類概述[J];計算機(jī)工程與科學(xué);2010年05期
10 趙自翔;王廣亮;李曉東;;基于支持向量機(jī)的不平衡數(shù)據(jù)分類的改進(jìn)欠采樣方法[J];中山大學(xué)學(xué)報(自然科學(xué)版);2012年06期
,本文編號:2212842
本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/2212842.html