基于懲罰似然的變量選擇方法及其在高維數(shù)據(jù)中的應用
[Abstract]:With the rapid development of information technology, we can obtain more and more data information and variable dimension. How to choose the best one from many candidate models has become an important research content in econometrics. A good variable selection method can change the problems existing in the traditional methods, such as large computation and over-fitting. The selected model has good prediction accuracy and prediction ability, effectively eliminates the interference variables, and obtains the most concise model. As a continuous optimization process, the penalty likelihood function method is more stable than the traditional discrete method, even when the number of variables is large, it can be executed effectively by using reasonable algorithm. Therefore, for high dimensional data model, it is more effective, accurate and stable to select the model by using the penalty likelihood function method. In this paper, based on the penalty likelihood function method, the variable selection methods for several kinds of high-dimensional data models are studied. The obtained methods can be used for model selection and variable estimation at the same time. By using probability theory and mathematical statistics, it is proved that the estimator has Oracle property, including the possibility of selecting the model correctly with probability approaching 1, and the asymptotic acceptance of the estimator from the normal distribution. The main conclusions are as follows: firstly, an adaptive bridge estimation method for high dimensional data model is proposed. Inspired by the bridge estimation method, this paper applies different weights to the penalty term according to the importance of the variable, and studies whether the adaptive bridge estimator meets the criteria of good estimator, that is, whether the adaptive bridge estimator has Oracle property. It includes whether the model can be selected correctly with probability approaching 1 and whether the estimator is asymptotically obedient to the normal distribution. In this paper, we prove that the adaptive bridge estimation method has Oracle property under proper conditions. The good numerical and empirical performance of the adaptive bridge estimation method is evaluated by random simulation and actual data. Secondly, in this paper, we study the M- estimation method of the linear regression model of high dimensional data, and discuss the properties of the estimator under the condition that the penalty term is local linear approximation. The frame method of least square estimation and Huber regression. When the outliers or error terms are distributed from the thick tail, the special case of M- estimation is more robust than the least square estimation. In this paper, it is theoretically proved that the estimator obtained by applying certain conditions and combining local linear approximation with M- estimator as objective function has a good large sample property. Choosing the appropriate algorithm to show that the method has better robustness; for ultra-high dimensional data model, we also show that backward regression and our proposed method is better; in the empirical part, The actual data show that the proposed method can select variables and estimate parameters well. Finally, this paper studies the identification method of credit default customers based on Logistic model. The Logistic model which is commonly used in the credit scoring model is selected to identify the influencing factors of the credit default and the Logistic model is used to measure and predict the default risk of the credit customers. The numerical simulation results show that the proposed variable selection method is effective. The empirical results also show that using the variable selection method of the high-dimensional data model proposed in this paper, we can select the model with higher interpretation and prediction ability.
【學位授予單位】:對外經(jīng)濟貿(mào)易大學
【學位級別】:博士
【學位授予年份】:2017
【分類號】:F224
【相似文獻】
相關期刊論文 前5條
1 吳翌琳;林寅;陳昊;;基于色差法的高維數(shù)據(jù)展示方法初探[J];統(tǒng)計與決策;2011年07期
2 吳武清;汪成杰;蔣勇;陳敏;;高維數(shù)據(jù)選元:方法比較及其在納稅評估中的應用[J];管理評論;2013年08期
3 郝媛;高學東;孟海東;;高維數(shù)據(jù)對象聚類算法效果分析[J];中國管理信息化;2012年08期
4 郭茜;朱杰;;高維數(shù)據(jù)挖掘技術在教學質(zhì)量監(jiān)控與評價的應用研究[J];全國商情(理論研究);2010年11期
5 顧冬娟;戴浩;;改進的基于密度和網(wǎng)格的高維聚類算法[J];科技創(chuàng)新導報;2008年22期
相關會議論文 前6條
1 周煜人;彭輝;桂衛(wèi)華;;基于映射的高維數(shù)據(jù)聚類方法[A];04'中國企業(yè)自動化和信息化建設論壇暨中南六省區(qū)自動化學會學術年會專輯[C];2004年
2 梁俊杰;楊澤新;馮玉才;;大規(guī)模高維數(shù)據(jù)庫索引結構[A];第二十三屆中國數(shù)據(jù)庫學術會議論文集(研究報告篇)[C];2006年
3 陳冠華;馬秀莉;楊冬青;唐世渭;帥猛;;面向高維數(shù)據(jù)的低冗余Top-k異常點發(fā)現(xiàn)方法[A];第26屆中國數(shù)據(jù)庫學術會議論文集(A輯)[C];2009年
4 劉運濤;鮑玉斌;吳丹;冷芳玲;孫煥良;于戈;;CBFrag-Cubing:一種基于壓縮位圖的高維數(shù)據(jù)立方創(chuàng)建算法(英文)[A];第二十二屆中國數(shù)據(jù)庫學術會議論文集(研究報告篇)[C];2005年
5 劉文慧;;PCA與PLS用于高維數(shù)據(jù)分類的比較性研究[A];2011年中國衛(wèi)生統(tǒng)計學年會會議論文集[C];2011年
6 劉喜蘭;馮德益;王公恕;朱成喜;馮雯;;臉譜分析在中進期地震跟蹤預報中的應用[A];中國地震學會第四次學術大會論文摘要集[C];1992年
相關重要報紙文章 前1條
1 本報記者 李雙藝;引領高維數(shù)據(jù)分析先河[N];吉林日報;2013年
相關博士學位論文 前10條
1 劉勝藍;余弦度量下的高維數(shù)據(jù)降維及分類方法研究[D];大連理工大學;2015年
2 黃曉輝;高維數(shù)據(jù)的若干聚類問題及算法研究[D];哈爾濱工業(yè)大學;2015年
3 楊崇;高維數(shù)據(jù)流上的K近鄰問題研究[D];山東大學;2016年
4 路梅;面向高維數(shù)據(jù)的特征學習理論與應用研究[D];蘇州大學;2016年
5 徐微微;高維數(shù)據(jù)降維可視化研究及其在生物醫(yī)學中的應用[D];武漢大學;2016年
6 連亦e,
本文編號:2220773
本文鏈接:http://sikaile.net/shoufeilunwen/jjglss/2220773.html