天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 自動化論文 >

面向非平衡類數(shù)據(jù)的分類器性能比較研究與方法改進(jìn)

發(fā)布時間:2018-08-30 10:47
【摘要】:類分布不均衡數(shù)據(jù)廣泛存在于現(xiàn)實世界中。在某些領(lǐng)域,少數(shù)類樣本被正確分類的重要性遠(yuǎn)遠(yuǎn)高于多數(shù)類。然而,大多數(shù)經(jīng)典分類算法均假設(shè)樣本的先驗概率分布平衡或者錯分類的代價相等。在處理非均衡分布數(shù)據(jù)時,少數(shù)類樣本的信息經(jīng)常被多數(shù)類樣本的信息掩蓋,致使少數(shù)類樣本的分類錯誤率遠(yuǎn)遠(yuǎn)高于多數(shù)類樣本。因此,非平衡類數(shù)據(jù)分類問題的研究受到越來越多的關(guān)注。由于非平衡數(shù)據(jù)集中樣本數(shù)量上的嚴(yán)重傾斜或者分布不均衡,傳統(tǒng)的分類算法直接處理非平衡類數(shù)據(jù)集時,會造成少數(shù)類樣本分類精度較差。因此,在數(shù)據(jù)層面采用混合抽樣方法改變類分布情況和在算法層面提出一種改進(jìn)的基于混合遺傳算法的選擇性集成算法提高分類性能,不僅能使分類性能得到改進(jìn),而且能使少數(shù)類的分類精度得到提升。主要研究工作和成果包括:(1)選擇基分類器。在WEKA平臺上,對比分析C4.5決策樹、BP神經(jīng)網(wǎng)絡(luò)、樸素貝葉斯和支持向量機(jī)四種分類器在平衡數(shù)據(jù)集和非平衡數(shù)據(jù)集的分類性能和穩(wěn)定性。(2)選擇性集成對平衡和非平衡數(shù)據(jù)集的影響。借助WEKA平臺,對比分析單一分類器和集成分類器在所有數(shù)據(jù)集上的分類精度,尋找集成學(xué)習(xí)中具有較大提升空間的基分類器組合:通過非平衡數(shù)據(jù)集在選擇性集成和非選擇性集成實驗下的分類性能的差異,驗證選擇性集成的可行性;通過在平衡與非平衡數(shù)據(jù)集上的集成分類性能不同,證明非平衡數(shù)據(jù)集需進(jìn)行數(shù)據(jù)層面的改動。(3)提出了一種基于非平衡數(shù)據(jù)分類問題的綜合集成方法。針對類別非平衡數(shù)據(jù)的分布特點,采用上抽樣SMOTE和下抽樣Bootstrap相結(jié)合的方式構(gòu)建相對平衡的訓(xùn)練集;接著通過混合遺傳算法選擇C4.5決策樹基分類器進(jìn)行集成學(xué)習(xí),從而提高非平衡數(shù)據(jù)集中少數(shù)類的分類效果。
[Abstract]:Class disequilibrium data widely exist in the real world. In some areas, the importance of a few samples being correctly classified is much higher than that of most classes. However, most classical classification algorithms assume that the prior probability distribution of samples is balanced or the cost of misclassification is equal. When dealing with the disequilibrium distribution data, the information of a few samples is often masked by the information of the majority samples, so the classification error rate of the minority samples is much higher than that of the majority samples. Therefore, more and more attention has been paid to the classification of unbalanced class data. Because of the serious skew in the number of samples in the unbalanced dataset or the uneven distribution, the traditional classification algorithm can directly deal with the non-equilibrium class dataset, which will result in the poor classification accuracy of a small number of samples. Therefore, using mixed sampling method to change the class distribution at the data level and improving the classification performance by an improved selective ensemble algorithm based on hybrid genetic algorithm can not only improve the classification performance, but also improve the classification performance. Moreover, the classification accuracy of a few classes can be improved. The main research work and achievements are as follows: (1) selecting base classifier. On the WEKA platform, the C4.5 decision tree BP neural network is compared and analyzed. The classification performance and stability of naive Bayes and support vector machines in balanced and unbalanced datasets. (2) the effect of selective integration on balanced and unbalanced datasets. With the help of WEKA platform, the classification accuracy of single classifier and integrated classifier on all data sets is compared and analyzed. Search for the combination of base classifiers with large lifting space in ensemble learning: verify the feasibility of selective integration by comparing the classification performance of non-balanced datasets in selective and non-selective ensemble experiments; It is proved that the non-equilibrium data sets need to be modified at the data level through the different performance of integrated classification on balanced and unbalanced data sets. (3) A comprehensive integration method based on unbalanced data classification problem is proposed. According to the distribution characteristics of class non-equilibrium data, a relatively balanced training set is constructed by combining top-sampling SMOTE and down-sampling Bootstrap, and then C4.5 decision tree based classifier is selected by hybrid genetic algorithm for ensemble learning. In order to improve the classification effect of a few classes in unbalanced data sets.
【學(xué)位授予單位】:大連海事大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP181

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 秦鋒;楊波;程澤凱;;分類器性能評價標(biāo)準(zhǔn)研究[J];計算機(jī)技術(shù)與發(fā)展;2006年10期

2 王麗麗;蘇德富;;基于群體智能的選擇性決策樹分類器集成[J];計算機(jī)技術(shù)與發(fā)展;2006年12期

3 楊曉霜;汪源源;;基于Moore-Penrose逆矩陣的選擇性集成[J];光電工程;2009年11期

4 王磊;;基于約束投影的支持向量機(jī)選擇性集成[J];計算機(jī)科學(xué);2009年10期

5 王成;劉亞峰;王新成;閆桂榮;;分類器的分類性能評價指標(biāo)[J];電子設(shè)計工程;2011年08期

6 呂卉;周聰;鄒娟;鄭金華;;基于多種群進(jìn)化的遺傳算法[J];計算機(jī)工程與應(yīng)用;2010年28期

7 李明方;張化祥;;針對不平衡數(shù)據(jù)集的Bagging改進(jìn)算法[J];計算機(jī)工程與應(yīng)用;2010年30期

8 倪黃晶;王蔚;;多類不平衡數(shù)據(jù)上的分類器性能比較研究[J];計算機(jī)工程;2011年10期

9 錢洪波;賀廣南;;非平衡類數(shù)據(jù)分類概述[J];計算機(jī)工程與科學(xué);2010年05期

10 趙自翔;王廣亮;李曉東;;基于支持向量機(jī)的不平衡數(shù)據(jù)分類的改進(jìn)欠采樣方法[J];中山大學(xué)學(xué)報(自然科學(xué)版);2012年06期

,

本文編號:2212842

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/2212842.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶837d0***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com