天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 軟件論文 >

面向不平衡數(shù)據(jù)集分類的改進(jìn)K-近鄰法研究

發(fā)布時間:2018-09-19 15:34
【摘要】:在信息化大爆炸的今天,如何高效地從現(xiàn)有復(fù)雜多變的信息中提取出人們所需要的信息是一個急需解決的難題。為了解決這個難題,機器學(xué)習(xí)、人工智能和模式識別等領(lǐng)域的學(xué)者們展開了深入的研究,分類方法是其中重要的研究方向之一。經(jīng)過多年的不懈努力,已有許多分類性能較好的方法應(yīng)用于分類問題。然而這些分類方法主要是以整體的分類誤判率、準(zhǔn)確率和召回率等作為分類目標(biāo),這些分類性能的評價指標(biāo)在不平衡數(shù)據(jù)集的分類問題中容易降低少數(shù)類和分布稀疏類樣本的識別率。由于現(xiàn)實生活的需要,人們越來越重視少數(shù)類的分類精度,故在保證不平衡數(shù)據(jù)集整體分類質(zhì)量的前提下提高少數(shù)類樣本的識別率是一個值得研究的熱點。本文主要研究了面向不平衡數(shù)據(jù)集分類的K-近鄰法,具體的工作如下:(1)針對傳統(tǒng)K-近鄰法在尋找近鄰樣本時由于較大的相似度計算量而導(dǎo)致分類速度慢的不足,引入了代表樣本和閾值。各測試樣本的近鄰樣本只在其與各類代表樣本相似程度不小于相應(yīng)閾值的類中選取,從而減少了計算量,在不影響分類精度的同時提高了分類速度。(2)對于傳統(tǒng)K-近鄰法對不平衡數(shù)據(jù)集分類精度低的問題,提出了類代表度與樣本代表度。通過賦予類代表程度大的近鄰樣本和少數(shù)類樣本較大權(quán)重來減弱多數(shù)類及分布密集類對分類的影響,從而提高了傳統(tǒng)K-近鄰法對不平衡數(shù)據(jù)集的分類精度。本文以UCI分類數(shù)據(jù)集作為實驗數(shù)據(jù)。通過比較傳統(tǒng)K-近鄰法與改進(jìn)K-近鄰法的各性能評價指標(biāo),結(jié)果顯示改進(jìn)的K-近鄰法在一定程度上提高了分類性能。
[Abstract]:How to efficiently extract the information that people need from the existing complex and changeable information is a difficult problem that needs to be solved in today's information-based Big Bang. In order to solve this problem, scholars in the fields of machine learning, artificial intelligence and pattern recognition have carried out in-depth research, and classification method is one of the important research directions. After years of unremitting efforts, there are many good classification performance methods applied to classification problems. However, these classification methods are mainly based on the overall classification error rate, accuracy rate and recall rate. It is easy to reduce the recognition rate of a few classes and distributed sparse class samples in the classification problem of unbalanced datasets. Due to the need of real life, people pay more and more attention to the classification accuracy of a few classes, so it is a hot topic to improve the recognition rate of a few kinds of samples on the premise of guaranteeing the overall classification quality of unbalanced data sets. In this paper, the K-nearest neighbor method for classification of unbalanced datasets is studied. The main works are as follows: (1) in order to solve the problem of slow classification speed caused by the large amount of similarity calculation, the traditional K-nearest neighbor method is used to find the nearest neighbor samples. The representative sample and threshold are introduced. The nearest neighbor sample of each test sample is only selected from the class whose similarity with each representative sample is not less than the corresponding threshold value, thus reducing the calculation amount. The classification accuracy is not affected and the classification speed is improved. (2) for the problem of low classification accuracy of traditional K-nearest neighbor method for unbalanced datasets, class representation and sample representation are proposed. In order to reduce the influence of most classes and distributed dense classes on the classification, the traditional K-nearest neighbor method can improve the classification accuracy of unbalanced data sets by giving a large weight to the nearest neighbor samples and a few class samples. In this paper, UCI classification data set is used as experimental data. By comparing the traditional K-nearest neighbor method with the improved K-nearest neighbor method, the results show that the improved K-nearest neighbor method improves the classification performance to some extent.
【學(xué)位授予單位】:西南交通大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP311.13

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 樊存佳;汪友生;邊航;;一種改進(jìn)的KNN文本分類算法[J];國外電子測量技術(shù);2015年12期

2 萬韓永;左家莉;萬劍怡;王明文;;基于樣本重要性原理的KNN文本分類算法[J];江西師范大學(xué)學(xué)報(自然科學(xué)版);2015年03期

3 羅賢鋒;祝勝林;陳澤健;袁玉強;;基于K-Medoids聚類的改進(jìn)KNN文本分類算法[J];計算機工程與設(shè)計;2014年11期

4 楊柳;于劍;景麗萍;;一種自適應(yīng)的大間隔近鄰分類算法[J];計算機研究與發(fā)展;2013年11期

5 余鷹;苗奪謙;劉財輝;王磊;;基于變精度粗糙集的KNN分類改進(jìn)算法[J];模式識別與人工智能;2012年04期

6 周靖;劉晉勝;;特征聯(lián)合熵的一種改進(jìn)K近鄰分類算法[J];計算機應(yīng)用;2011年07期

7 趙俊杰;盛劍鋒;陶新民;;一種基于特征加權(quán)的KNN文本分類算法[J];電腦學(xué)習(xí);2010年02期

8 印鑒;譚煥云;;基于χ~2統(tǒng)計量的kNN文本分類算法[J];小型微型計算機系統(tǒng);2007年06期

9 王曉曄,王正歐;K-最近鄰分類技術(shù)的改進(jìn)算法[J];電子與信息學(xué)報;2005年03期

10 李榮陸,胡運發(fā);基于密度的kNN文本分類器訓(xùn)練樣本裁剪方法[J];計算機研究與發(fā)展;2004年04期

相關(guān)碩士學(xué)位論文 前2條

1 梁洲;改進(jìn)的K-近鄰模式分類[D];電子科技大學(xué);2015年

2 孫麗華;中文文本自動分類的研究[D];哈爾濱工程大學(xué);2002年



本文編號:2250543

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2250543.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶93707***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com