面向不平衡數(shù)據(jù)集分類的改進(jìn)K-近鄰法研究
[Abstract]:How to efficiently extract the information that people need from the existing complex and changeable information is a difficult problem that needs to be solved in today's information-based Big Bang. In order to solve this problem, scholars in the fields of machine learning, artificial intelligence and pattern recognition have carried out in-depth research, and classification method is one of the important research directions. After years of unremitting efforts, there are many good classification performance methods applied to classification problems. However, these classification methods are mainly based on the overall classification error rate, accuracy rate and recall rate. It is easy to reduce the recognition rate of a few classes and distributed sparse class samples in the classification problem of unbalanced datasets. Due to the need of real life, people pay more and more attention to the classification accuracy of a few classes, so it is a hot topic to improve the recognition rate of a few kinds of samples on the premise of guaranteeing the overall classification quality of unbalanced data sets. In this paper, the K-nearest neighbor method for classification of unbalanced datasets is studied. The main works are as follows: (1) in order to solve the problem of slow classification speed caused by the large amount of similarity calculation, the traditional K-nearest neighbor method is used to find the nearest neighbor samples. The representative sample and threshold are introduced. The nearest neighbor sample of each test sample is only selected from the class whose similarity with each representative sample is not less than the corresponding threshold value, thus reducing the calculation amount. The classification accuracy is not affected and the classification speed is improved. (2) for the problem of low classification accuracy of traditional K-nearest neighbor method for unbalanced datasets, class representation and sample representation are proposed. In order to reduce the influence of most classes and distributed dense classes on the classification, the traditional K-nearest neighbor method can improve the classification accuracy of unbalanced data sets by giving a large weight to the nearest neighbor samples and a few class samples. In this paper, UCI classification data set is used as experimental data. By comparing the traditional K-nearest neighbor method with the improved K-nearest neighbor method, the results show that the improved K-nearest neighbor method improves the classification performance to some extent.
【學(xué)位授予單位】:西南交通大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP311.13
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 樊存佳;汪友生;邊航;;一種改進(jìn)的KNN文本分類算法[J];國外電子測量技術(shù);2015年12期
2 萬韓永;左家莉;萬劍怡;王明文;;基于樣本重要性原理的KNN文本分類算法[J];江西師范大學(xué)學(xué)報(自然科學(xué)版);2015年03期
3 羅賢鋒;祝勝林;陳澤健;袁玉強;;基于K-Medoids聚類的改進(jìn)KNN文本分類算法[J];計算機工程與設(shè)計;2014年11期
4 楊柳;于劍;景麗萍;;一種自適應(yīng)的大間隔近鄰分類算法[J];計算機研究與發(fā)展;2013年11期
5 余鷹;苗奪謙;劉財輝;王磊;;基于變精度粗糙集的KNN分類改進(jìn)算法[J];模式識別與人工智能;2012年04期
6 周靖;劉晉勝;;特征聯(lián)合熵的一種改進(jìn)K近鄰分類算法[J];計算機應(yīng)用;2011年07期
7 趙俊杰;盛劍鋒;陶新民;;一種基于特征加權(quán)的KNN文本分類算法[J];電腦學(xué)習(xí);2010年02期
8 印鑒;譚煥云;;基于χ~2統(tǒng)計量的kNN文本分類算法[J];小型微型計算機系統(tǒng);2007年06期
9 王曉曄,王正歐;K-最近鄰分類技術(shù)的改進(jìn)算法[J];電子與信息學(xué)報;2005年03期
10 李榮陸,胡運發(fā);基于密度的kNN文本分類器訓(xùn)練樣本裁剪方法[J];計算機研究與發(fā)展;2004年04期
相關(guān)碩士學(xué)位論文 前2條
1 梁洲;改進(jìn)的K-近鄰模式分類[D];電子科技大學(xué);2015年
2 孫麗華;中文文本自動分類的研究[D];哈爾濱工程大學(xué);2002年
,本文編號:2250543
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2250543.html