改進(jìn)的LMS-KNN近鄰分類方法研究

發(fā)布時(shí)間：2018-08-15 19:12

【摘要】：近鄰分類算法作為經(jīng)典的機(jī)器學(xué)習(xí)算法之一,因其無需估計(jì)參數(shù)、易于實(shí)現(xiàn)、適合多分類問題的特點(diǎn),近年來在廣告、聊天機(jī)器人、網(wǎng)絡(luò)安全、醫(yī)療保健、營銷策劃等領(lǐng)域得到了廣泛應(yīng)用。其中,基于局部均值與類均值的近鄰分類算法(Nearest neighbor classification based on local mean and class mean,LMS-KNN)是針對K近鄰分類(K-nearest neighbor classification)對離群點(diǎn)不敏感,沒有利用樣本全局信息等問題的一種改進(jìn)算法,改進(jìn)后的算法雖然在分類精度和分類效率得到一定的提高,但是該算法還存在一些弊端。數(shù)據(jù)的不平衡性會影響LMS-KNN的分類精度,同時(shí)該算法涉及到很多參數(shù)的設(shè)置,如近鄰值K的選取、權(quán)值的確定、距離度量方式的選取等等。因此,為了進(jìn)一步改進(jìn)LMS-KNN算法的分類精度,本文進(jìn)行了以下的研究工作:1)總結(jié)分析了幾種常用的近鄰分類方法和局部均值與類均值的近鄰分類算法,對比了它們各自的算法原理和優(yōu)缺點(diǎn),并簡單介紹了文中所用到的幾種優(yōu)化算法。2)針對不平衡數(shù)據(jù)對LMS-KNN分類精度的影響,運(yùn)用迭代近鄰過采樣的算法對數(shù)據(jù)進(jìn)行預(yù)處理,并把處理后的近似平衡數(shù)據(jù)集,用半監(jiān)督的局部均值與類均值進(jìn)行分類。3)采用交叉驗(yàn)證與傳統(tǒng)迭代算法確定LMS-KNN分類算法參數(shù),本文先將該分類算法交叉驗(yàn)證誤差模型化,再把類均值向量的權(quán)重基于客觀決策信息確定為數(shù)學(xué)公式,最終運(yùn)用步長優(yōu)化的統(tǒng)一迭代法來對加權(quán)權(quán)重進(jìn)行選取,在平衡主客觀決策規(guī)則的情況下改進(jìn)傳統(tǒng)算法的分類精度和分類效率。4)為了優(yōu)化LMS-KNN分類算法參數(shù)的確定,利用遺傳算法(Genetic Algorithm)可以在不依賴問題具體領(lǐng)域的情況下求解非線性、多目標(biāo)等復(fù)雜優(yōu)化問題,提出了一種基于遺傳算法的局部均值和類均值最近鄰分類算法,該方法選取類均值的權(quán)重為初始種群,以分類誤差為評價(jià)函數(shù),通過遺傳迭代選取最佳的類均值特征權(quán)重,與傳統(tǒng)的KNN、LM-KNN(A local mean based nonparametric classifier)及LMS-KNN算法的實(shí)驗(yàn)比較分析表明,該方法在UCI數(shù)據(jù)集上可有效地搜索出合適的特征權(quán)重,獲得更好的分類精度。
[Abstract]:As one of the classical machine learning algorithms, the nearest neighbor classification algorithm is suitable for multi-classification problems because it does not need to estimate parameters and is easy to implement. In recent years, it has been widely used in advertising, chat robot, network security, medical care, etc. Marketing planning and other fields have been widely used. Among them, the nearest neighbor classification algorithm based on local mean and class means, (Nearest neighbor classification based on local mean and class mean-LMS-KNN, is an improved algorithm for K-nearest neighbor classification (K-nearest neighbor classification) is insensitive to outliers and does not use global information of samples). Although the improved algorithm improves the classification accuracy and classification efficiency, it still has some drawbacks. The unbalance of data will affect the classification accuracy of LMS-KNN. At the same time, the algorithm involves the setting of many parameters, such as the selection of nearest neighbor value K, the determination of weight value, the selection of distance measure and so on. Therefore, in order to further improve the classification accuracy of the LMS-KNN algorithm, the following research work: 1) summarizes and analyzes several commonly used nearest neighbor classification methods and local mean and class mean nearest neighbor classification algorithms. In this paper, the principles, advantages and disadvantages of their algorithms are compared, and several optimization algorithms used in this paper are briefly introduced. In view of the effect of unbalanced data on LMS-KNN classification accuracy, the iterative nearest neighbor oversampling algorithm is used to preprocess the data. After processing the approximate equilibrium data set, the semi-supervised local mean and class mean are used to classify the parameters of the LMS-KNN classification algorithm. The cross-validation and the traditional iterative algorithm are used to determine the parameters of the LMS-KNN classification algorithm. In this paper, the cross-validation error of the classification algorithm is first modeled. Then the weight of the class mean vector is determined as a mathematical formula based on objective decision information, and the weighted weight is selected by the uniform iterative method of step size optimization. In order to optimize the parameter determination of LMS-KNN classification algorithm, genetic algorithm (Genetic Algorithm) can solve the nonlinearity without depending on the specific domain of the problem in order to optimize the parameter determination of LMS-KNN classification algorithm by improving the classification accuracy and classification efficiency of the traditional algorithm under the condition of balancing the subjective and objective decision rules. In this paper, a local mean and class mean nearest neighbor classification algorithm based on genetic algorithm is proposed. The weight of class mean is selected as initial population, and the classification error is used as evaluation function. The best class mean weight is selected by genetic iteration, and compared with the traditional KNNN LM-KNN (A local mean based nonparametric classifier) and LMS-KNN algorithm, the experimental results show that this method can effectively search the appropriate feature weights on the UCI dataset and obtain better classification accuracy.
【學(xué)位授予單位】：電子科技大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2017
【分類號】：TP181

【參考文獻(xiàn)】

相關(guān)期刊論文前10條

1 古平;楊煬;;面向不均衡數(shù)據(jù)集中少數(shù)類細(xì)分的過采樣算法[J];計(jì)算機(jī)工程;2017年02期

2 李彥冬;郝宗波;雷航;;卷積神經(jīng)網(wǎng)絡(luò)研究綜述[J];計(jì)算機(jī)應(yīng)用;2016年09期

3 曾勇;舒歡;胡江平;葛月月;;基于BP神經(jīng)網(wǎng)絡(luò)的自適應(yīng)偽最近鄰分類[J];電子與信息學(xué)報(bào);2016年11期

4 安波;;人工智能與博弈論——從阿爾法圍棋談起[J];中國發(fā)展觀察;2016年06期

5 文志誠;陳志剛;;基于隱馬爾可夫模型的網(wǎng)絡(luò)安全態(tài)勢預(yù)測方法[J];中南大學(xué)學(xué)報(bào)(自然科學(xué)版);2015年10期

6 崔承剛;楊曉飛;;基于內(nèi)部罰函數(shù)的進(jìn)化算法求解約束優(yōu)化問題[J];軟件學(xué)報(bào);2015年07期

7 蔣卓人;陳燕;高良才;湯幟;劉曉鐘;;一種結(jié)合有監(jiān)督學(xué)習(xí)的動態(tài)主題模型[J];北京大學(xué)學(xué)報(bào)(自然科學(xué)版);2015年02期

8 孟子健;馬江洪;;一種可選初始聚類中心的改進(jìn)k均值算法[J];統(tǒng)計(jì)與決策;2014年12期

9 李知藝;丁劍鷹;吳迪;文福拴;;步長優(yōu)化技術(shù)在交直流系統(tǒng)潮流計(jì)算中的應(yīng)用研究[J];華北電力大學(xué)學(xué)報(bào)(自然科學(xué)版);2014年03期

10 王秀巖;;決策樹算法及其應(yīng)用[J];電子技術(shù)與軟件工程;2014年05期

相關(guān)博士學(xué)位論文前2條

1 于文華;數(shù)學(xué)問題解決中模式識別的影響因素研究[D];南京師范大學(xué);2012年

2 向曉林;非線性代數(shù)方程組與幾何約束問題求解[D];四川大學(xué);2003年

相關(guān)碩士學(xué)位論文前6條

1 樊存佳;基于CHI和KNN的文本特征選擇與分類算法的研究[D];北京工業(yè)大學(xué);2016年

2 岳永鵬;深度無監(jiān)督學(xué)習(xí)算法研究[D];西南石油大學(xué);2015年

3 俞闖;半監(jiān)督學(xué)習(xí)中不平衡數(shù)據(jù)集分類研究[D];大連理工大學(xué);2015年

4 李俊平;人工智能技術(shù)的倫理問題及其對策研究[D];武漢理工大學(xué);2013年

5 徐曉艷;基于K近鄰算法的中文文本分類研究[D];安徽大學(xué);2012年

6 林麗;基于語義距離的文本聚類算法研究[D];廈門大學(xué);2007年

，

本文編號：2185147

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/2185147.html

上一篇：最佳的分類器鏈局部檢測與挖掘算法
下一篇：基于改進(jìn)BP神經(jīng)網(wǎng)絡(luò)的黑龍江農(nóng)機(jī)總動力預(yù)測

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

改進(jìn)的LMS-KNN近鄰分類方法研究