數(shù)據(jù)歸一化方法對提升SVM訓(xùn)練效率的研究
[Abstract]:Support Vector Machines (SVM) is a machine learning method based on statistical learning theory, structural risk minimization principle and VC dimension theory. It has been widely used in many fields for its excellent classification ability in recent decades, and is still one of the most popular research fields in machine learning. Data normalization is a necessary data preprocessing process for SVM training. The commonly used normalization strategies are [-1,+1], N (0,1), etc. However, the existing literature has not found the scientific basis for these commonly used normalization methods. In this paper, the order of SVM is minimal. It is found that the Gaussian kernel function will be affected by the attribute values of the data samples, and the participation of the Gaussian kernel function will be reduced if the attribute values are too large or too small. The plane is too rugged. The paper explores and studies the internal mechanism of data normalization by empirical experiments, and the effects of normalization and non-normalization on training efficiency and model prediction ability. The data are trained by SVM and the changes of the objective function values with the number of iterations, training time, model testing and k-CV performance are recorded. The algorithm is programmed with C++ 11 technology, and the calculation and output of the objective function value, its variation value, training time and test accuracy are realized. The typical research literature of sequential minimization optimization algorithm using Gaussian kernel function is deeply analyzed, and the optimal value of Gaussian kernel radius is determined, and the precision value of violation of KKT condition is determined. The results show that the determined values of lambda and kappa can achieve the best generalization ability, and through the analysis of the change curve of output data, we can draw a reasonable conclusion that the training efficiency of SVM can be improved by data pretreatment. (2) The methods of data pretreatment are studied deeply, especially the normalization of the maximum value and the normalization of the median value. Three different data normalization methods of standard fraction normalization are applied to SVM classifier. The experimental results show that the data normalization method can compensate for the shortage of kernel radius of Gaussian kernel function and make Gaussian kernel function more ideal for SVM classification. (3) Standard experimental data sets. Three different data normalization methods are used to preprocess the SVM data, and a variety of experimental methods are designed. The training time and test accuracy are recorded and compared in detail by using k-CV verification method. (4) By analyzing the effect of data normalization on the training efficiency of SVM and comparing the difference of classification ability, the optimal criterion of data normalization which can improve the training efficiency of SVM is put forward, i.e. the value of each data attribute is controlled within the conventional comparable range, such as: [-0.5, +0.5]~[-5, +5], N (0,1) ~ N (0,0) N (0). Through a large number of experimental analysis and verification, data normalization can effectively improve the training efficiency of SVM. This paper provides a scientific basis for data normalization of SVM and general machine learning algorithms.
【學(xué)位授予單位】:山東師范大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP181
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 柴巖;王慶菊;;基于邊界向量的樣本取樣SMO算法[J];系統(tǒng)工程;2015年06期
2 劉洛霞;;基于SVM的多變量函數(shù)回歸分析研究(英文)[J];電光與控制;2013年06期
3 王新志;陳偉;祝明坤;;樣本數(shù)據(jù)歸一化方式對GPS高程轉(zhuǎn)換的影響[J];測繪科學(xué);2013年06期
4 趙長春;姜曉愛;金英漢;;非線性回歸支持向量機(jī)的SMO算法改進(jìn)[J];北京航空航天大學(xué)學(xué)報(bào);2014年01期
5 劉學(xué)藝;李平;郜傳厚;;極限學(xué)習(xí)機(jī)的快速留一交叉驗(yàn)證算法[J];上海交通大學(xué)學(xué)報(bào);2011年08期
6 顧亞祥;丁世飛;;支持向量機(jī)研究進(jìn)展[J];計(jì)算機(jī)科學(xué);2011年02期
7 ;A new data normalization method for unsupervised anomaly intrusion detection[J];Journal of Zhejiang University-Science C(Computers & Electronics);2010年10期
8 濮定國;金中;;新的拉格朗日乘子方法[J];同濟(jì)大學(xué)學(xué)報(bào)(自然科學(xué)版);2010年09期
9 駱世廣;駱昌日;;加快SMO算法訓(xùn)練速度的策略研究[J];計(jì)算機(jī)工程與應(yīng)用;2007年33期
10 談效俊;張永新;錢敏平;張幼怡;鄧明華;;芯片數(shù)據(jù)標(biāo)準(zhǔn)化方法比較研究[J];生物化學(xué)與生物物理進(jìn)展;2007年06期
相關(guān)博士學(xué)位論文 前1條
1 段會川;高斯核函數(shù)支持向量分類機(jī)超級參數(shù)有效范圍研究[D];山東師范大學(xué);2012年
相關(guān)碩士學(xué)位論文 前2條
1 王正鵬;數(shù)據(jù)標(biāo)準(zhǔn)化及隨機(jī)游走下的語義關(guān)系相似度計(jì)算[D];復(fù)旦大學(xué);2012年
2 于丹;基因芯片數(shù)據(jù)歸一化處理的幾點(diǎn)研究[D];浙江大學(xué);2008年
,本文編號:2181215
本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/2181215.html