優(yōu)化前饋神經(jīng)網(wǎng)絡(luò)結(jié)構(gòu)的正則化方法
發(fā)布時(shí)間:2021-10-25 14:35
近年來,尋找最合適的前饋神經(jīng)網(wǎng)絡(luò)(FNN)架構(gòu)引起了極大的關(guān)注。一些研究提出了一些自動(dòng)的方法來找到一個(gè)小而充足的網(wǎng)絡(luò)結(jié)構(gòu),無需額外的再培訓(xùn)和修正。正則化項(xiàng)經(jīng)常被引入學(xué)習(xí)過程,并且已被證明能有效的提高泛化性能并減小網(wǎng)絡(luò)尺寸。特別地,Lp正則化在網(wǎng)絡(luò)訓(xùn)練中用于懲罰過大的權(quán)值范數(shù)。L1和L1/2正則化是兩種最流行的Lp正則化方法。然而,通常Lp正則化主要用于修剪冗余權(quán)值。換句話說,L1和L1/2正則化不能在單元層面改進(jìn)稀疏性。在本文中,我們考慮上述問題。首先,我們考察了一種Group Lasso正則化方法,直接處理由每個(gè)隱層神經(jīng)元的輸出權(quán)值向量的范數(shù)。作為比較,普通的的Lasso正則化方法只是用于網(wǎng)絡(luò)的標(biāo)準(zhǔn)誤差函數(shù)中,單獨(dú)處理每個(gè)權(quán)值。數(shù)值結(jié)果表明,對(duì)于每個(gè)基準(zhǔn)數(shù)據(jù)集,我們提出的隱層正則化方法比的Lasso正則化方法都能修剪更多的冗余隱層神經(jīng)元。但是,拉索組正規(guī)化可以修剪冗余隱節(jié)點(diǎn),但不能修剪神經(jīng)網(wǎng)絡(luò)的幸存隱節(jié)點(diǎn)的任何冗余權(quán)重。接下來,我們提出了一種組L1/2正則化方法(記為GL1/2),把從每個(gè)隱節(jié)點(diǎn)發(fā)出的輸出權(quán)值向量作為一個(gè)組,來修剪隱節(jié)點(diǎn)。它的優(yōu)點(diǎn)是不僅可以修剪冗余的隱節(jié)點(diǎn),還可以修剪...
【文章來源】:大連理工大學(xué)遼寧省 211工程院校 985工程院校 教育部直屬院校
【文章頁數(shù)】:105 頁
【學(xué)位級(jí)別】:博士
【文章目錄】:
ABSTRACT
摘要
1 Introduction
1.1 Artificial neural networks
1.1.1 Historical development of artificial neural networks
1.1.2 Biological neurons
1.1.3 Artificial neuron model and its basic elements
1.1.4 Activation functions
1.2 Learning mechanisms in artificial neural networks
1.2.1 Supervised learning
1.2.2 Unsupervised learning
1.2.3 Batch and online gradient descent methods
1.3 Artificial neural network architectures
1.3.1 Feedforward neural network
1.3.2 Recurrent neural network
1.4 Problem statement
1.5 Objectives of the thesis
1.6 Outline of the thesis
2 Background: Methods for optimizing artificial neural network architecture
2.1 Network growing method
2.2 Network pruning method
2.2.1 Sensitivity analysis methods
2.2.2 Penalty (Regularization) methods
2.2.3 Batch gradient method with L_(1/2) regularization term
3 Group lasso regularization method for pruning hidden layer nodes of feedforward neural networks
3.1 Introduction
3.2 Neural network structure and batch gradient method without any regularization term
3.3 Batch gradient method with hidden layer regularization terms
3.3.1 Batch gradient method with lasso regularization term
3.3.2 Batch gradient method with Group Lasso regularization term
3.4 Datasets
3.4.1 K-fold cross-validation method
3.4.2 Data normalization
3.5 Hidden neuron selection criterion
3.6 Reults
3.6.1 The iris results
3.6.2 The zoo results
3.6.3 The seeds results
3.6.4 The ionosphere results
3.7 Discussion
4 Group L_(1/2) regularization method for pruning hidden layer nodes of feedforward neural network
4.1 Introduction
4.2 Feedforward neural network and batch gradient algorithm
4.3 GL_2,GL_(1/2) and SGL_(1/2) regularizations for hidden nodes
4.3.1 Batch gradient method with GL_2 regularization
4.3.2 Batch gradient method with Group L_(1/2) regularization
4.3.3 Batch gradient method with smooth Group L_(1/2) regularization
4.4 A convergence theorem
4.5 Simulation results
4.5.1 Iris datasets
4.5.2 Balance scale datasets
4.5.3 Ecoli datasets
4.5.4 Lymphography datasets
4.5.5 Why can GL_(1/2) prune the redundant weights of the surviving hidden nodes?
4.6 Proofs
5 Conclusion and Future Work
5.1 Conclusion
5.2 Future work
5.3 Abstract of innovation points
References
Published academic articles during Ph.D. period
Acknowledgements
Author Bio
【參考文獻(xiàn)】:
期刊論文
[1]穩(wěn)健Lq(0<q<1)正則化理論:解的漸近分布與變量選擇一致性[J]. 常象宇,徐宗本,張海,王建軍,梁勇. 中國科學(xué):數(shù)學(xué). 2010(10)
[2]L1/2 regularization[J]. XU ZongBen 1 , ZHANG Hai 1,2 , WANG Yao 1 , CHANG XiangYu 1 & LIANG Yong 3 1 Institute of Information and System Science, Xi’an Jiaotong University, Xi’an 710049, China;2 Department of Mathematics, Northwest University, Xi’an 710069, China;3 University of Science and Technology, Macau 999078, China. Science China(Information Sciences). 2010(06)
[3]CONVERGENCE OF ONLINE GRADIENT METHOD WITH A PENALTY TERM FOR FEEDFORWARD NEURAL NETWORKS WITH STOCHASTIC INPUTS[J]. 邵紅梅,吳微,李峰. Numerical Mathematics A Journal of Chinese Universities(English Series). 2005(01)
本文編號(hào):3457602
【文章來源】:大連理工大學(xué)遼寧省 211工程院校 985工程院校 教育部直屬院校
【文章頁數(shù)】:105 頁
【學(xué)位級(jí)別】:博士
【文章目錄】:
ABSTRACT
摘要
1 Introduction
1.1 Artificial neural networks
1.1.1 Historical development of artificial neural networks
1.1.2 Biological neurons
1.1.3 Artificial neuron model and its basic elements
1.1.4 Activation functions
1.2 Learning mechanisms in artificial neural networks
1.2.1 Supervised learning
1.2.2 Unsupervised learning
1.2.3 Batch and online gradient descent methods
1.3 Artificial neural network architectures
1.3.1 Feedforward neural network
1.3.2 Recurrent neural network
1.4 Problem statement
1.5 Objectives of the thesis
1.6 Outline of the thesis
2 Background: Methods for optimizing artificial neural network architecture
2.1 Network growing method
2.2 Network pruning method
2.2.1 Sensitivity analysis methods
2.2.2 Penalty (Regularization) methods
2.2.3 Batch gradient method with L_(1/2) regularization term
3 Group lasso regularization method for pruning hidden layer nodes of feedforward neural networks
3.1 Introduction
3.2 Neural network structure and batch gradient method without any regularization term
3.3 Batch gradient method with hidden layer regularization terms
3.3.1 Batch gradient method with lasso regularization term
3.3.2 Batch gradient method with Group Lasso regularization term
3.4 Datasets
3.4.1 K-fold cross-validation method
3.4.2 Data normalization
3.5 Hidden neuron selection criterion
3.6 Reults
3.6.1 The iris results
3.6.2 The zoo results
3.6.3 The seeds results
3.6.4 The ionosphere results
3.7 Discussion
4 Group L_(1/2) regularization method for pruning hidden layer nodes of feedforward neural network
4.1 Introduction
4.2 Feedforward neural network and batch gradient algorithm
4.3 GL_2,GL_(1/2) and SGL_(1/2) regularizations for hidden nodes
4.3.1 Batch gradient method with GL_2 regularization
4.3.2 Batch gradient method with Group L_(1/2) regularization
4.3.3 Batch gradient method with smooth Group L_(1/2) regularization
4.4 A convergence theorem
4.5 Simulation results
4.5.1 Iris datasets
4.5.2 Balance scale datasets
4.5.3 Ecoli datasets
4.5.4 Lymphography datasets
4.5.5 Why can GL_(1/2) prune the redundant weights of the surviving hidden nodes?
4.6 Proofs
5 Conclusion and Future Work
5.1 Conclusion
5.2 Future work
5.3 Abstract of innovation points
References
Published academic articles during Ph.D. period
Acknowledgements
Author Bio
【參考文獻(xiàn)】:
期刊論文
[1]穩(wěn)健Lq(0<q<1)正則化理論:解的漸近分布與變量選擇一致性[J]. 常象宇,徐宗本,張海,王建軍,梁勇. 中國科學(xué):數(shù)學(xué). 2010(10)
[2]L1/2 regularization[J]. XU ZongBen 1 , ZHANG Hai 1,2 , WANG Yao 1 , CHANG XiangYu 1 & LIANG Yong 3 1 Institute of Information and System Science, Xi’an Jiaotong University, Xi’an 710049, China;2 Department of Mathematics, Northwest University, Xi’an 710069, China;3 University of Science and Technology, Macau 999078, China. Science China(Information Sciences). 2010(06)
[3]CONVERGENCE OF ONLINE GRADIENT METHOD WITH A PENALTY TERM FOR FEEDFORWARD NEURAL NETWORKS WITH STOCHASTIC INPUTS[J]. 邵紅梅,吳微,李峰. Numerical Mathematics A Journal of Chinese Universities(English Series). 2005(01)
本文編號(hào):3457602
本文鏈接:http://sikaile.net/shoufeilunwen/xxkjbs/3457602.html
最近更新
教材專著