SVM訓(xùn)練數(shù)據(jù)歸一化研究
發(fā)布時間:2018-09-12 12:00
【摘要】:數(shù)據(jù)歸一化是訓(xùn)練支持向量機(jī)(SVM)必須的數(shù)據(jù)預(yù)處理過程.常用的歸一化方法有[-1,+1]、N(0,1)等方法,但現(xiàn)有文獻(xiàn)尚未發(fā)現(xiàn)關(guān)于這些常用歸一化方法科學(xué)依據(jù)方面的研究.本文以經(jīng)驗性的實(shí)驗對數(shù)據(jù)歸一化的理由、歸一化與不歸一化對訓(xùn)練效率和模型預(yù)測能力影響等方面開展研究.論文選擇標(biāo)準(zhǔn)數(shù)據(jù)集,對原始未歸一化、不同方法歸一化、人工逆歸一化、任選數(shù)據(jù)屬性列等情況下的數(shù)據(jù)分別進(jìn)行了SVM訓(xùn)練,并記錄目標(biāo)函數(shù)值隨迭代次數(shù)的變化、訓(xùn)練時間、模型測試及k-CV性能等信息.實(shí)驗結(jié)果表明,將數(shù)據(jù)值限制在常規(guī)范圍內(nèi)的歸一化方法,如[-0.5,+0.5]~[-5,+5]、N(0,1)~N(0,5)等均能在訓(xùn)練時間最短的情況下獲得最佳的預(yù)測模型.本文工作為SVM以及一般機(jī)器學(xué)習(xí)算法的數(shù)據(jù)歸一化提供了科學(xué)依據(jù).
[Abstract]:Data normalization is a necessary data preprocessing process for training support vector machine (SVM). The commonly used normalization methods are [-1,1] N (0 ~ (1), but no research on the scientific basis of these commonly used normalization methods has been found in the existing literature. In this paper, the empirical experimental reasons for data normalization and the effects of normalization and non-normalization on training efficiency and model prediction ability are studied. In this paper, the standard data set is selected, and the data under the condition of original unnormalized, different method normalization, artificial inverse normalization, optional data attribute column and so on are trained by SVM respectively, and the change of objective function value with the number of iterations is recorded. Training time, model testing and k-CV performance information. The experimental results show that the normalized methods, such as [-0.5, 0.5] ~ [-5,5] N (0 1) N (0 5), can obtain the best prediction model under the condition of the shortest training time. This work provides a scientific basis for data normalization of SVM and general machine learning algorithms.
【作者單位】: 山東師范大學(xué)信息科學(xué)與工程學(xué)院;山東師范大學(xué)山東省分布式計算機(jī)軟件新技術(shù)重點(diǎn)實(shí)驗室;山東師范大學(xué)實(shí)驗室與設(shè)備管理處;
【分類號】:TP181
本文編號:2238937
[Abstract]:Data normalization is a necessary data preprocessing process for training support vector machine (SVM). The commonly used normalization methods are [-1,1] N (0 ~ (1), but no research on the scientific basis of these commonly used normalization methods has been found in the existing literature. In this paper, the empirical experimental reasons for data normalization and the effects of normalization and non-normalization on training efficiency and model prediction ability are studied. In this paper, the standard data set is selected, and the data under the condition of original unnormalized, different method normalization, artificial inverse normalization, optional data attribute column and so on are trained by SVM respectively, and the change of objective function value with the number of iterations is recorded. Training time, model testing and k-CV performance information. The experimental results show that the normalized methods, such as [-0.5, 0.5] ~ [-5,5] N (0 1) N (0 5), can obtain the best prediction model under the condition of the shortest training time. This work provides a scientific basis for data normalization of SVM and general machine learning algorithms.
【作者單位】: 山東師范大學(xué)信息科學(xué)與工程學(xué)院;山東師范大學(xué)山東省分布式計算機(jī)軟件新技術(shù)重點(diǎn)實(shí)驗室;山東師范大學(xué)實(shí)驗室與設(shè)備管理處;
【分類號】:TP181
【相似文獻(xiàn)】
相關(guān)期刊論文 前1條
1 劉慧敏;王宏強(qiáng);黎湘;;基于RPROP算法目標(biāo)識別的數(shù)據(jù)歸一化研究[J];現(xiàn)代雷達(dá);2009年05期
,本文編號:2238937
本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/2238937.html
最近更新
教材專著