回歸模型中變量選擇的若干問題研究

發(fā)布時間：2018-05-19 16:52

本文選題：變量選擇 + Gamma分布��；參考：《蘭州交通大學》2017年碩士論文

【摘要】：在多元線性回歸建模中,自變量的選擇至關重要,一般從預測的準確性和模型的可解釋性兩個方面進行約束自變量個數(shù)的選擇.數(shù)目眾多的自變量可以反映更多響應變量的信息,從而達到更高的預測準確性,然而太多的自變量將導致模型可解釋性減弱,應用價值大打折扣;自變量的太少的話,不足以反映響應變量的信息,因而預測準確性顯著降低.變量選擇問題的研究中,大多是在普通最小二乘法的基礎上,附加關于待估計參數(shù)的約束條件,也就是增加懲罰函數(shù),轉化為懲罰最小二乘法.由于約束條件的壓縮作用,會使得部分待估計參數(shù)變?yōu)?,從而實現(xiàn)變量選擇的目的.此類方法中的常用經典算法有LASSO算法、適應性LASSO算法、SCAD算法以及彈性網算法.本文考慮待估計參數(shù)受到隨機因素的影響前提下,建立了新的懲罰函數(shù)及懲罰最小二乘估計方法,并對該方法進行評價,具體內容如下:首先,系統(tǒng)介紹了變量選擇方法的發(fā)展過程、通過添加懲罰函數(shù)來實現(xiàn)變量選擇的基本思想;詳細分析了LASSO算法、適應性LASSO算法、SCAD算法以及彈性網算法的建立過程和各自的優(yōu)缺點:由于LASSO算法中懲罰函數(shù)的特性,導致在變量選擇時選取的自變量個數(shù)偏多,同時存在多重共線時LASSO算法效果很差,于是適應性LASSO算法在LASSO的基礎上進行改進,使得估計所得系數(shù)更加稀疏,選擇更少的自變量;SCAD算法效果更加更明顯,不僅可以選擇更少的自變量,同時所得估計量滿足稀疏性、無偏性、連續(xù)性以及Oracle等一系列優(yōu)良性質;彈性網方法是將LASSO與經典的嶺回歸法結合而建立的新的變量選擇方法,該方法主要優(yōu)勢在于處理自變量中出現(xiàn)組效應時的情形.其次,考慮到Gamma分布和Weibull分布是兩類重要的壽命分布類,具有廣泛的應用,于是分別假定參數(shù)受到的隨機影響因素服從Gamma分布和Weibull分布,建立了新的懲罰函數(shù)以及懲罰最小二乘估計方法.文中通過層次極大似然估計法構造新的懲罰函數(shù),討論了懲罰函數(shù)性質,給出了參數(shù)估計的方法并證明新建立的懲罰最小二乘量滿足Oracle性質.最后,通過案例分析對新建立的變量選擇方法進行評價.文中以均方誤差和平均絕對誤差作為評價指標,選取了以往文獻中使用的經典案例進行分析,計算各評價指標,并和LASSO算法、適應性LASSO算法、SCAD算法以及彈性網算法計算的結果進行對比,我們發(fā)現(xiàn),新建立的算法處理稀疏情形優(yōu)勢明顯,均優(yōu)于其他算法,而對于非稀疏情形,效果和適應性LASSO算法差異不大.
[Abstract]:In multivariate linear regression modeling, the selection of independent variables is very important. In general, the selection of the number of independent variables is constrained from two aspects of the accuracy of prediction and the interpretability of the model. A large number of independent variables can reflect more information of the response variables, thus achieving higher prediction accuracy. However, too many independent variables will lead to the model. The type of interpretability is weakened and the application value is discounted; too few of the independent variables are not sufficient to reflect the information of the response variables, so the accuracy of the prediction is significantly reduced. In the study of the selection of variables, the constraints of the parameters to be estimated are added to the general least square method, which is to increase the penalty function. In order to punish the least square method, due to the compression of the constraint conditions, some parameters to be estimated will be changed to 0 to achieve the purpose of variable selection. The commonly used classical algorithms in this kind of method have LASSO algorithm, adaptive LASSO algorithm, SCAD algorithm and elastic network algorithm. A new penalty function and a penalty least square estimation method are established, and the method is evaluated. The specific contents are as follows: firstly, the development process of variable selection method is introduced, and the basic idea of variable selection is realized by adding penalty function. The LASSO algorithm, adaptive LASSO algorithm, SCAD algorithm and elastic network are analyzed in detail. The process of building the algorithm and its advantages and disadvantages: because of the characteristics of the penalty function in the LASSO algorithm, the number of independent variables selected in the selection of variables is much more than that of the variable selection. At the same time, the effect of the LASSO algorithm is very poor when there is multiple Coline. So the adaptive LASSO algorithm is modified on the basis of LASSO, making the estimated coefficient more sparse and less choice. The effect of SCAD algorithm is more obvious, not only can choose less independent variables, but also the estimated quantity satisfies a series of excellent properties such as sparsity, unbiased, continuous and Oracle. The elastic network method is a new variable selection method which combines the LASSO with the classical ridge regression method. The main advantage of this method lies in the advantages of the method. Second, the Gamma distribution and Weibull distribution are two classes of important life distribution classes, which are widely used. Therefore, the random influence factors of the parameters are assumed to be subject to the Gamma distribution and the Weibull distribution, and a new penalty penalty function and a penalty least square estimation method are established. This paper constructs a new penalty function by hierarchical maximum likelihood estimation, discusses the property of penalty function, gives the method of parameter estimation and proves that the newly established penalty least squares satisfy the Oracle property. Finally, the new variable selection method is evaluated by case analysis. The mean square error and the mean absolute error are used as the evaluation. By analyzing the classic cases used in the previous literature and calculating the evaluation indexes, we compare the results with the LASSO algorithm, the adaptive LASSO algorithm, the SCAD algorithm and the elastic network algorithm. We find that the new algorithm is superior to other algorithms in dealing with the sparse situation and is better than the other algorithms, but for the non sparse case, There is little difference between the effect and the adaptive LASSO algorithm.
【學位授予單位】：蘭州交通大學
【學位級別】：碩士
【學位授予年份】：2017
【分類號】：F224

【參考文獻】

相關博士學位論文前3條

1 袁晶;貝葉斯方法在變量選擇問題中的應用[D];山東大學;2013年

2 劉吉彩;生存數(shù)據統(tǒng)計模型的變量選擇方法[D];華東師范大學;2014年

3 樊亞莉;穩(wěn)健變量選擇方法的若干問題研究[D];復旦大學;2013年

相關碩士學位論文前1條

1 高少龍;幾種變量選擇方法的模擬研究和實證分析[D];山東大學;2014年

，

本文編號：1910897

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/jingjifazhanlunwen/1910897.html

上一篇：基于Hotelling模型的兩廠商選址定價完全序貫決策
下一篇：習近平總書記關于政府和市場關系的思想研究

論文發(fā)表

·知網|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

回歸模型中變量選擇的若干問題研究