單純形分布模型變量選擇及其應(yīng)用研究
發(fā)布時間:2018-11-18 12:22
【摘要】:Barndorff-Nielsen和Jorgensen(1991)提出單純形分布(simplexdistribution),它是取值在單位區(qū)間(0,1)上的連續(xù)分布,適合用來分析比例數(shù)據(jù)。在經(jīng)濟學研究中,經(jīng)常需要分析服從單純形分布的數(shù)據(jù),諸如基尼系數(shù)、恩格爾系數(shù)等,都是取值在(0,1)上的隨機變量。變量選擇作為統(tǒng)計建模中很重要的問題之一,其方法研究一直是統(tǒng)計學界的熱點課題。Tibshirani(1996)提出的Lasso方法,可以同時進行變量選擇和參數(shù)估計,很好地克服了傳統(tǒng)方法的一些不足,為模型選擇這一領(lǐng)域注入了新活力,尤其是計算Lasso的有效算法LARS(Efron,2004)的提出,使Lasso方法廣泛流行起來。本文針對單純形分布模型,用LARS-Lasso方法探討了它的變量選擇問題,并應(yīng)用于中國基尼系數(shù)的影響因素分析。本文主要做了如下三個方面的工作: 1.將LARS-Lasso方法應(yīng)用于單純形分布模型的變量選擇,得到了模型的Lasso估計。從單純形分布模型的極大似然函數(shù)出發(fā),引入Lasso懲罰項,應(yīng)用牛頓迭代算法進行局部二次逼近將ML轉(zhuǎn)換LS類型,從而實現(xiàn)LARS算法的有效估計。 2.模擬研究。利用R軟件編程,對LARS-Lasso方法和逐步回歸進行效果比較,驗證了基于LARS-Lasso的單純形分布模型變量選擇的可行性、有效性。 3.實例分析。首先對世界銀行網(wǎng)站公布的世界各國歷年部分基尼系數(shù)(樣本量為644)進行探索性分析,,利用R繪制頻率直方圖、核密度估計曲線、正態(tài)分布擬合曲線和單純形分布擬合曲線,進而指出比例數(shù)據(jù)近似服從單純形分布的合理性,然后,將上述方法應(yīng)用于中國基尼系數(shù)的影響因素分析,得出城鄉(xiāng)居民收入差距、社會保障支出等因素是影響基尼系數(shù)的主要變量,取得了很好的分析效果。 綜上所述,本文較為系統(tǒng)地研究了Lasso方法在單純形分布模型變量選擇中的應(yīng)用,推廣和發(fā)展了Tibshirani(1996)和Barndorff-Nielsen和Jorgensen(1991)等人的工作,模擬研究與實例分析表明了本文提出的方法簡潔有效,特別是對比例數(shù)據(jù)的分析研究具有很好的應(yīng)用價值。
[Abstract]:Barndorff-Nielsen and Jorgensen (1991) proposed simplex distribution (simplexdistribution), which is a continuous distribution with values in unit interval (0 ~ 1), which is suitable for analyzing proportional data. In the study of economics, it is often necessary to analyze the data from simplex distribution, such as Gini coefficient, Engel coefficient, etc. Variable selection is one of the most important problems in statistical modeling. The method of variable selection is always a hot topic in the field of statistics,. Tibshirani (1996). It can be used to select variables and estimate parameters at the same time. It overcomes some shortcomings of traditional methods and injects new vitality into the field of model selection, especially the effective algorithm LARS (Efron,2004) for computing Lasso, which makes Lasso method popular widely. In this paper, the problem of variable selection for simplex distribution model is discussed by LARS-Lasso method, and applied to the analysis of the influencing factors of Gini coefficient in China. The main work of this paper is as follows: 1. The LARS-Lasso method is applied to the variable selection of simplex distribution model and the Lasso estimation of the model is obtained. Based on the maximum likelihood function of simplex distribution model, the Lasso penalty term is introduced and the ML is transformed into LS type by local quadratic approximation using Newton iterative algorithm, thus the effective estimation of LARS algorithm is realized. 2. Simulation study. The results of LARS-Lasso method and stepwise regression are compared by using R software, and the feasibility and validity of variable selection of simplex distribution model based on LARS-Lasso are verified. 3. Case analysis First of all, some Gini coefficients (sample size 644) published on the website of the World Bank are analyzed, and the frequency histogram, kernel density estimation curve, normal distribution fitting curve and simplex distribution fitting curve are drawn by using R. Then it points out that the proportional data is reasonable from simplex distribution, and then applies the above method to the analysis of the influencing factors of Gini coefficient in China, and obtains the income gap between urban and rural residents. Social security expenditure and other factors are the main variables affecting Gini coefficient, and good results have been obtained. To sum up, this paper systematically studies the application of Lasso method in the selection of simplex distribution model variables, and generalizes and develops the work of Tibshirani (1996), Barndorff-Nielsen and Jorgensen (1991), etc. Simulation research and example analysis show that the method proposed in this paper is simple and effective, especially the analysis of proportional data has a good application value.
【學位授予單位】:貴州財經(jīng)大學
【學位級別】:碩士
【學位授予年份】:2013
【分類號】:F224;F124.7
[Abstract]:Barndorff-Nielsen and Jorgensen (1991) proposed simplex distribution (simplexdistribution), which is a continuous distribution with values in unit interval (0 ~ 1), which is suitable for analyzing proportional data. In the study of economics, it is often necessary to analyze the data from simplex distribution, such as Gini coefficient, Engel coefficient, etc. Variable selection is one of the most important problems in statistical modeling. The method of variable selection is always a hot topic in the field of statistics,. Tibshirani (1996). It can be used to select variables and estimate parameters at the same time. It overcomes some shortcomings of traditional methods and injects new vitality into the field of model selection, especially the effective algorithm LARS (Efron,2004) for computing Lasso, which makes Lasso method popular widely. In this paper, the problem of variable selection for simplex distribution model is discussed by LARS-Lasso method, and applied to the analysis of the influencing factors of Gini coefficient in China. The main work of this paper is as follows: 1. The LARS-Lasso method is applied to the variable selection of simplex distribution model and the Lasso estimation of the model is obtained. Based on the maximum likelihood function of simplex distribution model, the Lasso penalty term is introduced and the ML is transformed into LS type by local quadratic approximation using Newton iterative algorithm, thus the effective estimation of LARS algorithm is realized. 2. Simulation study. The results of LARS-Lasso method and stepwise regression are compared by using R software, and the feasibility and validity of variable selection of simplex distribution model based on LARS-Lasso are verified. 3. Case analysis First of all, some Gini coefficients (sample size 644) published on the website of the World Bank are analyzed, and the frequency histogram, kernel density estimation curve, normal distribution fitting curve and simplex distribution fitting curve are drawn by using R. Then it points out that the proportional data is reasonable from simplex distribution, and then applies the above method to the analysis of the influencing factors of Gini coefficient in China, and obtains the income gap between urban and rural residents. Social security expenditure and other factors are the main variables affecting Gini coefficient, and good results have been obtained. To sum up, this paper systematically studies the application of Lasso method in the selection of simplex distribution model variables, and generalizes and develops the work of Tibshirani (1996), Barndorff-Nielsen and Jorgensen (1991), etc. Simulation research and example analysis show that the method proposed in this paper is simple and effective, especially the analysis of proportional data has a good application value.
【學位授予單位】:貴州財經(jīng)大學
【學位級別】:碩士
【學位授予年份】:2013
【分類號】:F224;F124.7
【參考文獻】
相關(guān)期刊論文 前10條
1 陳建寧;;社會保障對收入差距調(diào)節(jié)的困境及對策[J];保險研究;2010年12期
2 黃濤;胡宜國;胡宜朝;;地區(qū)人均GDP分布的基尼系數(shù)分析[J];管理世界;2006年05期
3 陳宗勝;中國居民收入分配差別的深入研究——評《中國居民收入分配再研究》[J];經(jīng)濟研究;2000年07期
4 王小魯,樊綱;中國收入差距的走勢和影響因素分析[J];經(jīng)濟研究;2005年10期
5 程永宏;;二元經(jīng)濟中城鄉(xiāng)混合基尼系數(shù)的計算與分解[J];經(jīng)濟研究;2006年01期
6 曲兆鵬;趙忠;;老齡化對我國農(nóng)村消費和收入不平等的影響[J];經(jīng)濟研究;2008年12期
7 解鋒昌;李勇;;單形分布變離差檢驗的Score統(tǒng)計量的局部影響[J];南京理工大學學報(自然科學版);2005年06期
8 劉睿智;杜n
本文編號:2340029
本文鏈接:http://sikaile.net/jingjilunwen/zhongguojingjilunwen/2340029.html
最近更新
教材專著