基于邊界混合采樣的非均衡數(shù)據(jù)處理算法
發(fā)布時(shí)間:2018-08-06 21:25
【摘要】:針對(duì)非均衡數(shù)據(jù)分類效果差的問題,提出一種新的基于邊界混合采樣的非均衡數(shù)據(jù)處理方法(BMS).首先通過引進(jìn)"變異系數(shù)"找出樣本的邊界域和非邊界域;然后對(duì)邊界域中的少數(shù)類樣本進(jìn)行過采樣,對(duì)非邊界域中的多數(shù)類樣本進(jìn)行隨機(jī)欠采樣,以期達(dá)到訓(xùn)練數(shù)據(jù)基本平衡的目標(biāo).實(shí)驗(yàn)結(jié)果表明,BMS方法比其他3種流行的非均衡數(shù)據(jù)處理方法在對(duì)7個(gè)公開數(shù)據(jù)集的分類性能上平均提高了5%左右,因此,該方法可以廣泛應(yīng)用于非均衡數(shù)據(jù)的處理和分類中.
[Abstract]:In order to solve the problem of poor classification effect of unbalanced data, a new unbalanced data processing method based on boundary mixed sampling (BMS).) is proposed. First, the "coefficient of variation" is introduced to find out the boundary and non-boundary regions of the samples, then a few samples in the boundary domain are oversampled, and most of the samples in the non-boundary domain are randomly under-sampled. In order to achieve the basic balance of training data goal. The experimental results show that the classification performance of the BMS method is about 5% higher than that of the other three popular non-equilibrium data processing methods. Therefore, this method can be widely used in the processing and classification of disequilibrium data.
【作者單位】: 西北大學(xué)信息科學(xué)與技術(shù)學(xué)院;西北大學(xué)經(jīng)濟(jì)管理學(xué)院;西北大學(xué)數(shù)學(xué)學(xué)院;
【基金】:陜西省教育廳科學(xué)研究計(jì)劃自然科學(xué)專項(xiàng)項(xiàng)目(15JK1738) 陜西省自然科學(xué)基金項(xiàng)目(2014JQ8367)
【分類號(hào)】:TP311.13
本文編號(hào):2169036
[Abstract]:In order to solve the problem of poor classification effect of unbalanced data, a new unbalanced data processing method based on boundary mixed sampling (BMS).) is proposed. First, the "coefficient of variation" is introduced to find out the boundary and non-boundary regions of the samples, then a few samples in the boundary domain are oversampled, and most of the samples in the non-boundary domain are randomly under-sampled. In order to achieve the basic balance of training data goal. The experimental results show that the classification performance of the BMS method is about 5% higher than that of the other three popular non-equilibrium data processing methods. Therefore, this method can be widely used in the processing and classification of disequilibrium data.
【作者單位】: 西北大學(xué)信息科學(xué)與技術(shù)學(xué)院;西北大學(xué)經(jīng)濟(jì)管理學(xué)院;西北大學(xué)數(shù)學(xué)學(xué)院;
【基金】:陜西省教育廳科學(xué)研究計(jì)劃自然科學(xué)專項(xiàng)項(xiàng)目(15JK1738) 陜西省自然科學(xué)基金項(xiàng)目(2014JQ8367)
【分類號(hào)】:TP311.13
【相似文獻(xiàn)】
相關(guān)期刊論文 前2條
1 范阿琳;任樹華;;一種融合變異系數(shù)的k-mean聚類分析方法[J];計(jì)算機(jī)工程與應(yīng)用;2012年35期
2 王以之;陳廣鑫;潘佶;;“學(xué)生體質(zhì)、健康卡片”計(jì)量資料的微機(jī)計(jì)算、管理系統(tǒng)[J];體育科學(xué);1986年02期
,本文編號(hào):2169036
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2169036.html
最近更新
教材專著