當(dāng)前位置：主頁(yè) > 科技論文 > 自動(dòng)化論文 >

數(shù)據(jù)歸一化方法對(duì)提升SVM訓(xùn)練效率的研究

發(fā)布時(shí)間：2018-08-13 14:06

【摘要】：支持向量機(jī)(Support Vector Machines,SVM)是基于統(tǒng)計(jì)學(xué)習(xí)理論,建立在結(jié)構(gòu)風(fēng)險(xiǎn)最小化原理和VC維理論基礎(chǔ)上的一種機(jī)器學(xué)習(xí)方法。近幾十年來(lái)以其優(yōu)秀的分類能力在很多領(lǐng)域得到廣泛應(yīng)用,至今仍然是機(jī)器學(xué)習(xí)領(lǐng)域最熱門的研究之一,眾多的國(guó)內(nèi)外學(xué)者都致力于SVM訓(xùn)練效率的提升。數(shù)據(jù)歸一化是訓(xùn)練支持向量機(jī)必須的數(shù)據(jù)預(yù)處理過(guò)程。常用的歸一化策略有[-1,+1]、N(0,1)等方法,但現(xiàn)有文獻(xiàn)尚未發(fā)現(xiàn)關(guān)于這些常用歸一化方法科學(xué)依據(jù)方面的研究。本文通過(guò)對(duì)SVM中順序最小優(yōu)化算法運(yùn)行機(jī)制的研究,發(fā)現(xiàn)高斯核函數(shù)會(huì)受到數(shù)據(jù)樣本屬性值的影響,數(shù)據(jù)屬性值過(guò)大或過(guò)小都會(huì)使高斯核函數(shù)的參與度降低。數(shù)據(jù)歸一化恰好能夠?qū)?shù)據(jù)限定在某一范圍內(nèi),使其能夠更好地配合高斯核半徑,從而避免最優(yōu)分類超平面過(guò)于崎嶇。論文以經(jīng)驗(yàn)性的實(shí)驗(yàn)對(duì)數(shù)據(jù)歸一化的內(nèi)在機(jī)理、歸一化與不歸一化對(duì)訓(xùn)練效率和模型預(yù)測(cè)能力影響等方面開(kāi)展了探索和研究。論文選擇標(biāo)準(zhǔn)數(shù)據(jù)集,對(duì)原始未歸一化、不同方法歸一化、人工非歸一化、任選數(shù)據(jù)屬性列等情況下的數(shù)據(jù)分別進(jìn)行了SVM訓(xùn)練,并記錄目標(biāo)函數(shù)值隨迭代次數(shù)的變化、訓(xùn)練時(shí)間、模型測(cè)試及k-CV性能等信息。概括起來(lái)取得了如下的研究成果:(1)在傳統(tǒng)的順序最小優(yōu)化算法(SMO)的基礎(chǔ)上,總結(jié)出了目標(biāo)函數(shù)值及其變化量的表達(dá)式,并使用C++11技術(shù)進(jìn)行了算法編程,實(shí)現(xiàn)了目標(biāo)函數(shù)值及其變化值和訓(xùn)練時(shí)間及測(cè)試正確率的計(jì)算和輸出。對(duì)使用高斯核函數(shù)的順序最小優(yōu)化算法的典型研究文獻(xiàn)進(jìn)行深入分析,確定了高斯核半徑的最優(yōu)值λ以及違反KKT條件的精度值κ。實(shí)驗(yàn)結(jié)果表明所確定的λ值和κ值能夠達(dá)到最好的泛化能力,并通過(guò)對(duì)輸出數(shù)據(jù)變化曲線的分析得出有根據(jù)的結(jié)論:可以通過(guò)數(shù)據(jù)的預(yù)處理來(lái)改進(jìn)SVM訓(xùn)練效率。(2)對(duì)數(shù)據(jù)預(yù)處理的方式方法進(jìn)行了深入研究,尤其是對(duì)最值歸一化、中值歸一化、標(biāo)準(zhǔn)分?jǐn)?shù)歸一化三種不同數(shù)據(jù)歸一化方法進(jìn)行了應(yīng)用實(shí)現(xiàn),使其與SVM分類機(jī)進(jìn)行了有機(jī)融合。實(shí)驗(yàn)結(jié)果表明數(shù)據(jù)歸一化方法可以彌補(bǔ)高斯核函數(shù)核半徑認(rèn)為選擇上的不足,使高斯核函數(shù)更加理想地應(yīng)用于SVM分類。(3)對(duì)標(biāo)準(zhǔn)實(shí)驗(yàn)數(shù)據(jù)集以三種不同的數(shù)據(jù)歸一化方法進(jìn)行了預(yù)處理,設(shè)計(jì)了多種實(shí)驗(yàn)方式,利用k-CV驗(yàn)證方法,對(duì)訓(xùn)練時(shí)間以及測(cè)試正確率進(jìn)行了詳細(xì)記錄和比較。最終通過(guò)分析數(shù)據(jù)歸一化后SVM訓(xùn)練效率的變化得出了數(shù)據(jù)歸一化可以提升SVM訓(xùn)練效率的較為根本的內(nèi)在機(jī)制。(4)通過(guò)數(shù)據(jù)歸一化對(duì)SVM訓(xùn)練效率影響的分析以及對(duì)分類能力差異的比較,分析出了最能提升SVM訓(xùn)練效率的數(shù)據(jù)歸一化的最優(yōu)限定原則,即將各數(shù)據(jù)屬性的值控制在常規(guī)的可比擬的數(shù)值范圍內(nèi),如:[-0.5,+0.5]~[-5,+5]、N(0,1)~N(0,5)等。通過(guò)大量的實(shí)驗(yàn)分析驗(yàn)證,數(shù)據(jù)歸一化能夠有效的提升SVM的訓(xùn)練效率。本文為SVM以及一般機(jī)器學(xué)習(xí)算法的數(shù)據(jù)歸一化提供了科學(xué)依據(jù)。
[Abstract]:Support Vector Machines (SVM) is a machine learning method based on statistical learning theory, structural risk minimization principle and VC dimension theory. It has been widely used in many fields for its excellent classification ability in recent decades, and is still one of the most popular research fields in machine learning. Data normalization is a necessary data preprocessing process for SVM training. The commonly used normalization strategies are [-1,+1], N (0,1), etc. However, the existing literature has not found the scientific basis for these commonly used normalization methods. In this paper, the order of SVM is minimal. It is found that the Gaussian kernel function will be affected by the attribute values of the data samples, and the participation of the Gaussian kernel function will be reduced if the attribute values are too large or too small. The plane is too rugged. The paper explores and studies the internal mechanism of data normalization by empirical experiments, and the effects of normalization and non-normalization on training efficiency and model prediction ability. The data are trained by SVM and the changes of the objective function values with the number of iterations, training time, model testing and k-CV performance are recorded. The algorithm is programmed with C++ 11 technology, and the calculation and output of the objective function value, its variation value, training time and test accuracy are realized. The typical research literature of sequential minimization optimization algorithm using Gaussian kernel function is deeply analyzed, and the optimal value of Gaussian kernel radius is determined, and the precision value of violation of KKT condition is determined. The results show that the determined values of lambda and kappa can achieve the best generalization ability, and through the analysis of the change curve of output data, we can draw a reasonable conclusion that the training efficiency of SVM can be improved by data pretreatment. (2) The methods of data pretreatment are studied deeply, especially the normalization of the maximum value and the normalization of the median value. Three different data normalization methods of standard fraction normalization are applied to SVM classifier. The experimental results show that the data normalization method can compensate for the shortage of kernel radius of Gaussian kernel function and make Gaussian kernel function more ideal for SVM classification. (3) Standard experimental data sets. Three different data normalization methods are used to preprocess the SVM data, and a variety of experimental methods are designed. The training time and test accuracy are recorded and compared in detail by using k-CV verification method. (4) By analyzing the effect of data normalization on the training efficiency of SVM and comparing the difference of classification ability, the optimal criterion of data normalization which can improve the training efficiency of SVM is put forward, i.e. the value of each data attribute is controlled within the conventional comparable range, such as: [-0.5, +0.5]~[-5, +5], N (0,1) ~ N (0,0) N (0). Through a large number of experimental analysis and verification, data normalization can effectively improve the training efficiency of SVM. This paper provides a scientific basis for data normalization of SVM and general machine learning algorithms.
【學(xué)位授予單位】：山東師范大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2017
【分類號(hào)】：TP181

【參考文獻(xiàn)】

相關(guān)期刊論文前10條

1 柴巖;王慶菊;;基于邊界向量的樣本取樣SMO算法[J];系統(tǒng)工程;2015年06期

2 劉洛霞;;基于SVM的多變量函數(shù)回歸分析研究(英文)[J];電光與控制;2013年06期

3 王新志;陳偉;祝明坤;;樣本數(shù)據(jù)歸一化方式對(duì)GPS高程轉(zhuǎn)換的影響[J];測(cè)繪科學(xué);2013年06期

4 趙長(zhǎng)春;姜曉愛(ài);金英漢;;非線性回歸支持向量機(jī)的SMO算法改進(jìn)[J];北京航空航天大學(xué)學(xué)報(bào);2014年01期

5 劉學(xué)藝;李平;郜傳厚;;極限學(xué)習(xí)機(jī)的快速留一交叉驗(yàn)證算法[J];上海交通大學(xué)學(xué)報(bào);2011年08期

6 顧亞祥;丁世飛;;支持向量機(jī)研究進(jìn)展[J];計(jì)算機(jī)科學(xué);2011年02期

7 ;A new data normalization method for unsupervised anomaly intrusion detection[J];Journal of Zhejiang University-Science C(Computers & Electronics);2010年10期

8 濮定國(guó);金中;;新的拉格朗日乘子方法[J];同濟(jì)大學(xué)學(xué)報(bào)(自然科學(xué)版);2010年09期

9 駱世廣;駱昌日;;加快SMO算法訓(xùn)練速度的策略研究[J];計(jì)算機(jī)工程與應(yīng)用;2007年33期

10 談效俊;張永新;錢敏平;張幼怡;鄧明華;;芯片數(shù)據(jù)標(biāo)準(zhǔn)化方法比較研究[J];生物化學(xué)與生物物理進(jìn)展;2007年06期

相關(guān)博士學(xué)位論文前1條

1 段會(huì)川;高斯核函數(shù)支持向量分類機(jī)超級(jí)參數(shù)有效范圍研究[D];山東師范大學(xué);2012年

相關(guān)碩士學(xué)位論文前2條

1 王正鵬;數(shù)據(jù)標(biāo)準(zhǔn)化及隨機(jī)游走下的語(yǔ)義關(guān)系相似度計(jì)算[D];復(fù)旦大學(xué);2012年

2 于丹;基因芯片數(shù)據(jù)歸一化處理的幾點(diǎn)研究[D];浙江大學(xué);2008年

，

本文編號(hào)：2181215

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/2181215.html

上一篇：層疊式柔版印刷機(jī)控制系統(tǒng)設(shè)計(jì)
下一篇：面向小型金屬鑄件的機(jī)器人自動(dòng)拋磨關(guān)鍵技術(shù)研究

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

數(shù)據(jù)歸一化方法對(duì)提升SVM訓(xùn)練效率的研究