基于廣義線性模型的SLS方法的收縮估計(jì)
發(fā)布時(shí)間:2018-10-31 19:20
【摘要】:隨著大數(shù)據(jù)時(shí)代的到來,高維數(shù)據(jù)大量涌現(xiàn),給模型變量選擇方法帶來了挑戰(zhàn),也成為現(xiàn)代統(tǒng)計(jì)學(xué)研究的熱點(diǎn)問題。廣義線性模型作為常見的統(tǒng)計(jì)模型在實(shí)踐生活中得到廣泛使用,而學(xué)者對(duì)于其變量選擇方法研究和應(yīng)用較少。因此文中引入一個(gè)組合懲罰的變量選擇方法——SLS方法,將該方法擴(kuò)展應(yīng)用到廣義線性模型之中,對(duì)Logistic回歸模型進(jìn)行推廣,使用Monte Carlo基于三種情況數(shù)據(jù)模擬:(1)模擬研究變量弱相關(guān)情況下SLS方法與MCP方法的優(yōu)劣;(2)模擬對(duì)比高維數(shù)據(jù)且變量高度相關(guān)情況下,SLS方法與MCP方法的優(yōu)劣;(3)模擬當(dāng)變量之間存在共線性和高度相關(guān)性情況下的變量選擇效果,并與Lasso、Adaptive Lasso、Elastic-net、Adaptive Elastic-net結(jié)果進(jìn)行比較分析。通過使用坐標(biāo)下降算法(CCD)對(duì)SLS方法進(jìn)行計(jì)算,并使用5折交叉驗(yàn)證對(duì)參數(shù)進(jìn)行選擇。結(jié)果顯示:(1)無論是變量弱相關(guān)還是高度相關(guān)情況下,SLS方法都能夠有效的做出選擇,而且效果上相對(duì)于MCP都有所改進(jìn)。(2)變量之間多重共線性和高度相關(guān)情況下,Lasso、Adaptive Lasso、Elastic-net、Adaptive Elastic-net以及SLS五種變量選擇方法,都能夠把共線的變量移除模型,而SLS能夠高度相關(guān)的變量全部選入到模型之中,在效果上優(yōu)于其他四種方法。
[Abstract]:With the arrival of big data era, a large number of high-dimensional data emerged, which has brought challenges to the method of model variable selection, and has become a hot issue in modern statistical research. As a common statistical model, generalized linear model has been widely used in practice, but few scholars have studied and applied the method of variable selection. In this paper, a combined penalty variable selection method, SLS method, is introduced in this paper. The method is extended to the generalized linear model, and the Logistic regression model is generalized. Monte Carlo is used to simulate the data in three cases: (1) to study the advantages and disadvantages of SLS method and MCP method under the condition of weak correlation of variables; (2) the advantages and disadvantages of SLS method and MCP method are compared in the case of high dimensional data and high correlation of variables. (3) the effect of variable selection is simulated when there is collinearity and high correlation between variables, and the results are compared with those of Lasso,Adaptive Lasso,Elastic-net,Adaptive Elastic-net. The coordinate descent algorithm (CCD) is used to calculate the SLS method, and 50% discount cross-validation is used to select the parameters. The results show that: (1) when variables are weakly correlated or highly correlated, the SLS method can make choices effectively, and the effect is improved compared with MCP. (2) in the case of multiple collinearity and high correlation between variables, Lasso,Adaptive Lasso,Elastic-net,Adaptive Elastic-net and SLS can remove the collinear variables from the model, while SLS can select all the highly relevant variables into the model, which is better than the other four methods.
【學(xué)位授予單位】:暨南大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:O212
本文編號(hào):2303293
[Abstract]:With the arrival of big data era, a large number of high-dimensional data emerged, which has brought challenges to the method of model variable selection, and has become a hot issue in modern statistical research. As a common statistical model, generalized linear model has been widely used in practice, but few scholars have studied and applied the method of variable selection. In this paper, a combined penalty variable selection method, SLS method, is introduced in this paper. The method is extended to the generalized linear model, and the Logistic regression model is generalized. Monte Carlo is used to simulate the data in three cases: (1) to study the advantages and disadvantages of SLS method and MCP method under the condition of weak correlation of variables; (2) the advantages and disadvantages of SLS method and MCP method are compared in the case of high dimensional data and high correlation of variables. (3) the effect of variable selection is simulated when there is collinearity and high correlation between variables, and the results are compared with those of Lasso,Adaptive Lasso,Elastic-net,Adaptive Elastic-net. The coordinate descent algorithm (CCD) is used to calculate the SLS method, and 50% discount cross-validation is used to select the parameters. The results show that: (1) when variables are weakly correlated or highly correlated, the SLS method can make choices effectively, and the effect is improved compared with MCP. (2) in the case of multiple collinearity and high correlation between variables, Lasso,Adaptive Lasso,Elastic-net,Adaptive Elastic-net and SLS can remove the collinear variables from the model, while SLS can select all the highly relevant variables into the model, which is better than the other four methods.
【學(xué)位授予單位】:暨南大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:O212
【參考文獻(xiàn)】
相關(guān)期刊論文 前1條
1 蔡鵬;高啟兵;;廣義線性模型中的變量選擇[J];中國科學(xué)技術(shù)大學(xué)學(xué)報(bào);2006年09期
相關(guān)碩士學(xué)位論文 前2條
1 黃登香;Elastic Net方法在幾類模型變量選擇中的應(yīng)用[D];廣西大學(xué);2014年
2 盧穎;廣義線性模型基于Elastic Net的變量選擇方法研究[D];北京交通大學(xué);2011年
,本文編號(hào):2303293
本文鏈接:http://sikaile.net/kejilunwen/yysx/2303293.html
最近更新
教材專著