基于懲罰似然的變量選擇方法及其在高維數(shù)據(jù)中的應用

發(fā)布時間：2018-09-03 18:13

【摘要】：隨著信息技術的快速發(fā)展,我們能夠獲得到的數(shù)據(jù)信息量和變量維數(shù)越來越大。如何從眾多候選模型中選擇最佳的一個,就成為計量經(jīng)濟學重要的研究內(nèi)容。好的變量選擇方法能夠改變傳統(tǒng)方法存在的計算量大和過度擬合等問題,選出的模型有良好的預測精度和預測能力,有效地排除掉干擾變量,獲得最簡潔的模型。懲罰似然函數(shù)法作為連續(xù)的最優(yōu)化過程,與傳統(tǒng)的離散方法相比更穩(wěn)定,即使變量個數(shù)很大時,通過運用合理的算法也能有效的執(zhí)行。因此對于高維數(shù)據(jù)模型來說,用懲罰似然函數(shù)法來進行模型選擇將會更加有效,準確,穩(wěn)定。本文基于懲罰似然函數(shù)方法,研究了幾類高維數(shù)據(jù)模型的變量選擇方法,獲得的方法能夠同時進行模型選擇和變量估計;此外,運用概率論和數(shù)理統(tǒng)計知識證明了估計量具有Oracle性質(zhì),包括能夠以概率趨于1正確地選擇模型以及估計量漸近地服從正態(tài)分布。具體來說,本文研究的方法及主要結論如下:首先,本文提出了高維數(shù)據(jù)模型自適應橋估計方法。受橋估計方法的啟發(fā),本文按照變量的重要性程度對懲罰項施加不同的權重,研究自適應橋估計量是否滿足好的估計量的標準,即是否具有Oracle性質(zhì),包括能否以概率趨于1正確地選擇模型以及估計量是否漸近地服從正態(tài)分布。本文證明了在適當?shù)臈l件下,自適應橋估計方法具有Oracle性質(zhì)。通過隨機模擬和實際數(shù)據(jù)來評價自適應橋估計方法的良好的數(shù)值表現(xiàn)和實證表現(xiàn)。其次,本文研究了高維數(shù)據(jù)線性回歸模型的M-估計方法,討論了懲罰項為局部線性逼近情形下的估計量的性質(zhì)。M-估計方法是涵蓋最小一乘估計、分位數(shù)回歸、最小二乘估計以及Huber回歸的框架性方法。當數(shù)據(jù)出現(xiàn)異常值或誤差項服從厚尾分布時,此時M-估計的特殊情形——最小一乘回歸比最小二乘估計更加穩(wěn)健。本文在理論上證明,通過施加一定的條件,M-估計和局部線性逼近結合作為目標函數(shù)獲得的估計量具有良好的大樣本性質(zhì);在數(shù)值模擬部分,選擇了編寫合適的算法展現(xiàn)了該方法具有更好的穩(wěn)健性;對于超高維數(shù)據(jù)模型,我們也通過模擬說明向后回歸與我們提出的方法相結合表現(xiàn)更好;在實證部分,通過實際數(shù)據(jù)說明了我們提出的方法能夠很好的選擇變量和估計參數(shù)。最后,本文研究了高維情形下基于Logistic模型的信貸違約客戶識別方法。選取了信用評分模型中常用的Logistic模型對信貸違約行為的影響因素進行識別,同時利用所建立的Logistic模型對信貸客戶的違約風險進行衡量與預測。數(shù)值模擬結果表明,本文提出的變量選擇方法是有效的。實證結果也說明運用本文提出的高維數(shù)據(jù)模型的變量選擇方法,可以選出具有較高解釋能力和預測能力的模型。
[Abstract]:With the rapid development of information technology, we can obtain more and more data information and variable dimension. How to choose the best one from many candidate models has become an important research content in econometrics. A good variable selection method can change the problems existing in the traditional methods, such as large computation and over-fitting. The selected model has good prediction accuracy and prediction ability, effectively eliminates the interference variables, and obtains the most concise model. As a continuous optimization process, the penalty likelihood function method is more stable than the traditional discrete method, even when the number of variables is large, it can be executed effectively by using reasonable algorithm. Therefore, for high dimensional data model, it is more effective, accurate and stable to select the model by using the penalty likelihood function method. In this paper, based on the penalty likelihood function method, the variable selection methods for several kinds of high-dimensional data models are studied. The obtained methods can be used for model selection and variable estimation at the same time. By using probability theory and mathematical statistics, it is proved that the estimator has Oracle property, including the possibility of selecting the model correctly with probability approaching 1, and the asymptotic acceptance of the estimator from the normal distribution. The main conclusions are as follows: firstly, an adaptive bridge estimation method for high dimensional data model is proposed. Inspired by the bridge estimation method, this paper applies different weights to the penalty term according to the importance of the variable, and studies whether the adaptive bridge estimator meets the criteria of good estimator, that is, whether the adaptive bridge estimator has Oracle property. It includes whether the model can be selected correctly with probability approaching 1 and whether the estimator is asymptotically obedient to the normal distribution. In this paper, we prove that the adaptive bridge estimation method has Oracle property under proper conditions. The good numerical and empirical performance of the adaptive bridge estimation method is evaluated by random simulation and actual data. Secondly, in this paper, we study the M- estimation method of the linear regression model of high dimensional data, and discuss the properties of the estimator under the condition that the penalty term is local linear approximation. The frame method of least square estimation and Huber regression. When the outliers or error terms are distributed from the thick tail, the special case of M- estimation is more robust than the least square estimation. In this paper, it is theoretically proved that the estimator obtained by applying certain conditions and combining local linear approximation with M- estimator as objective function has a good large sample property. Choosing the appropriate algorithm to show that the method has better robustness; for ultra-high dimensional data model, we also show that backward regression and our proposed method is better; in the empirical part, The actual data show that the proposed method can select variables and estimate parameters well. Finally, this paper studies the identification method of credit default customers based on Logistic model. The Logistic model which is commonly used in the credit scoring model is selected to identify the influencing factors of the credit default and the Logistic model is used to measure and predict the default risk of the credit customers. The numerical simulation results show that the proposed variable selection method is effective. The empirical results also show that using the variable selection method of the high-dimensional data model proposed in this paper, we can select the model with higher interpretation and prediction ability.
【學位授予單位】：對外經(jīng)濟貿(mào)易大學
【學位級別】：博士
【學位授予年份】：2017
【分類號】：F224

【相似文獻】

相關期刊論文前5條

1 吳翌琳;林寅;陳昊;;基于色差法的高維數(shù)據(jù)展示方法初探[J];統(tǒng)計與決策;2011年07期

2 吳武清;汪成杰;蔣勇;陳敏;;高維數(shù)據(jù)選元:方法比較及其在納稅評估中的應用[J];管理評論;2013年08期

3 郝媛;高學東;孟海東;;高維數(shù)據(jù)對象聚類算法效果分析[J];中國管理信息化;2012年08期

4 郭茜;朱杰;;高維數(shù)據(jù)挖掘技術在教學質(zhì)量監(jiān)控與評價的應用研究[J];全國商情(理論研究);2010年11期

5 顧冬娟;戴浩;;改進的基于密度和網(wǎng)格的高維聚類算法[J];科技創(chuàng)新導報;2008年22期

相關會議論文前6條

1 周煜人;彭輝;桂衛(wèi)華;;基于映射的高維數(shù)據(jù)聚類方法[A];04'中國企業(yè)自動化和信息化建設論壇暨中南六省區(qū)自動化學會學術年會專輯[C];2004年

2 梁俊杰;楊澤新;馮玉才;;大規(guī)模高維數(shù)據(jù)庫索引結構[A];第二十三屆中國數(shù)據(jù)庫學術會議論文集（研究報告篇）[C];2006年

3 陳冠華;馬秀莉;楊冬青;唐世渭;帥猛;;面向高維數(shù)據(jù)的低冗余Top-k異常點發(fā)現(xiàn)方法[A];第26屆中國數(shù)據(jù)庫學術會議論文集（A輯）[C];2009年

4 劉運濤;鮑玉斌;吳丹;冷芳玲;孫煥良;于戈;;CBFrag-Cubing:一種基于壓縮位圖的高維數(shù)據(jù)立方創(chuàng)建算法(英文)[A];第二十二屆中國數(shù)據(jù)庫學術會議論文集（研究報告篇）[C];2005年

5 劉文慧;;PCA與PLS用于高維數(shù)據(jù)分類的比較性研究[A];2011年中國衛(wèi)生統(tǒng)計學年會會議論文集[C];2011年

6 劉喜蘭;馮德益;王公恕;朱成喜;馮雯;;臉譜分析在中進期地震跟蹤預報中的應用[A];中國地震學會第四次學術大會論文摘要集[C];1992年

相關重要報紙文章前1條

1 本報記者李雙藝;引領高維數(shù)據(jù)分析先河[N];吉林日報;2013年

相關博士學位論文前10條

1 劉勝藍;余弦度量下的高維數(shù)據(jù)降維及分類方法研究[D];大連理工大學;2015年

2 黃曉輝;高維數(shù)據(jù)的若干聚類問題及算法研究[D];哈爾濱工業(yè)大學;2015年

3 楊崇;高維數(shù)據(jù)流上的K近鄰問題研究[D];山東大學;2016年

4 路梅;面向高維數(shù)據(jù)的特征學習理論與應用研究[D];蘇州大學;2016年

5 徐微微;高維數(shù)據(jù)降維可視化研究及其在生物醫(yī)學中的應用[D];武漢大學;2016年

6 連亦e，

本文編號：2220773

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/shoufeilunwen/jjglss/2220773.html

上一篇：人口城鎮(zhèn)化背景下青島市基本公共服務研究
下一篇：中國消費者的宗教性及其對顧客忠誠的影響

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于懲罰似然的變量選擇方法及其在高維數(shù)據(jù)中的應用