刪失分位數(shù)回歸的光滑化算法
本文選題:刪失分位數(shù)回歸 + 光滑函數(shù); 參考:《北京交通大學(xué)》2015年碩士論文
【摘要】:摘要:近年來,高維線性回歸模型在信息技術(shù)、生物學(xué)、化學(xué)計量學(xué)、基因組學(xué)、經(jīng)濟學(xué)、金融學(xué)、功能性磁共振成像等科學(xué)領(lǐng)域備受關(guān)注.“高維”回歸模型是指在回歸模型中未知變量的個數(shù)比樣品的數(shù)量大得多.很顯然,如果沒有額外的假設(shè),這類數(shù)據(jù)是病態(tài)的,是幾乎是現(xiàn)在技術(shù)不可能解決的.所以通常我們要在模型上做出一些假設(shè).而一個比較好的假設(shè)條件是使用稀疏假設(shè).即假定只有少數(shù)未知變量影響樣本的觀測值.高維數(shù)據(jù)分析給統(tǒng)計學(xué)家?guī)碓S多挑戰(zhàn),迫切需要新的方法和理論. 為了估計高維線性回歸的回歸系數(shù),我們需要選取適當?shù)幕貧w方法.普通的最小二乘回歸模型的主旨在于基于解釋變量來估計因變量的均值.而分位數(shù)回歸模型利用自變量和因變量的條件分位數(shù)來進行建模.與最小二乘回歸相比,條件分位數(shù)回歸模型具有穩(wěn)健型和靈活性的優(yōu)點.所以本文考慮使用分位數(shù)回歸模型來解決高維稀疏線性回歸模型. 長期以來,加正則項是一個處理高維稀疏數(shù)據(jù)的有效的并被廣泛使用的方法.加正則項這一技巧可以使函數(shù)更快的收斂.另外,這一技術(shù)可以使得高維線性模型的求解變得容易.因為加上正則項,許多回歸模型都具有很好的oracle性質(zhì).正則項分為很多種,主要有l(wèi)p,l1和加權(quán)l(xiāng)1懲罰.本文中,我們考慮加權(quán)l(xiāng)1懲罰. 在醫(yī)學(xué)領(lǐng)域中,刪失分位數(shù)回歸是做生存分析的有力工具.刪失數(shù)據(jù)是指在某種設(shè)定下,樣本值并不能被完全觀測到的數(shù)據(jù).例如,樣本值高于或低于某一個固定(或隨機)的值時,我們只能觀測到那個固定(或隨機的值).這樣得到的數(shù)據(jù)是不完整的,叫做刪失數(shù)據(jù).在醫(yī)學(xué)領(lǐng)域中,刪失數(shù)據(jù)分位數(shù)回歸模型已經(jīng)取代Cox比例風險模型和加速失效時間(AFT)模型成為研究生存分析的主要方法.本文中,我們考慮加正則項的稀疏高維刪失分位數(shù)回歸模型.由于刪失分位數(shù)回歸模型最終可以轉(zhuǎn)換為分位數(shù)回歸模型的線性結(jié)合,我們可以將解決分位數(shù)回歸模型的方法用于解決刪失數(shù)據(jù)分位數(shù)回歸模型. 文中,我們首次使用光滑函數(shù)解決刪失分位數(shù)回歸的問題.首先,我們在第一、二、三章分別介紹了分位數(shù)以及刪失數(shù)據(jù)以及高維數(shù)據(jù)的相關(guān)背景知識以及基本性質(zhì).其次,我們列舉了兩個光滑函數(shù),包括分位數(shù)Huber懲罰函數(shù),去代替分位數(shù)函數(shù).由于Huber懲罰函數(shù)具有和分位數(shù)損失函數(shù)一樣的最優(yōu)值點,我們在文章的理論部分主要使用Huber懲罰函數(shù)作為研究對象.使用光滑函數(shù)使得我們模型的目標函數(shù)一—刪失分位數(shù)回歸模型成為可微函數(shù).因此我們可以得到了有著一階和二階次微分的目標函數(shù).在可微的基礎(chǔ)上,我們利用加權(quán)的l1正則懲罰項,為刪失數(shù)據(jù)分位數(shù)回歸模型設(shè)計了一個加權(quán)光滑迭代算法——MIRL,去實現(xiàn)刪失分位數(shù)回歸中的變量選擇問題.于是我們不僅可以得到算法的收斂性,還證明了模型的最優(yōu)解在一般假設(shè)條件下具有漸近正態(tài)性質(zhì),oracle性質(zhì)等良好的統(tǒng)計性質(zhì).數(shù)值實驗部分,我們做了充分的實驗——隨機高斯矩陣實驗和Toeplitz協(xié)方差矩陣實驗.在數(shù)值實驗表中,最明顯的特征就是FPR和TPR分別幾乎是0和1.這表示,我們的方法可以準確的將有效變量挑選出來,這就說明模型和算法有很好的變量選擇功能.不僅實驗誤差非常小,而且實現(xiàn)了很好的變量選擇效果,這說明我們的算法有很好的效果.
[Abstract]:Abstract: in recent years, high dimensional linear regression models have attracted much attention in the fields of information technology, biology, chemometrics, genomics, economics, finance, functional magnetic resonance imaging and other scientific fields. "High dimensional" regression model means that the number of unknown variables in the regression model is much larger than that of the sample. Obviously, if there is no extra Assuming that this kind of data is ill conditioned, it is almost impossible for technology to solve it now. So we usually have to make some assumptions on the model. A better assumption is using the sparse assumption. That is, it is assumed that only a few unknown variables affect the observation of the sample. High dimension analysis brings many challenges to statisticians. New methods and theories are needed.
In order to estimate the regression coefficient of high dimensional linear regression, we need to select the appropriate regression method. The main purpose of the ordinary least square regression model is to estimate the mean of the dependent variable based on the explanatory variable. The quantile regression model uses the independent variable and the conditional quantile of the dependent variable to model. Compared with the least square regression, the condition is compared with the least square regression. Quantile regression models have the advantages of robustness and flexibility. Therefore, we consider quantile regression models to solve high-dimensional sparse linear regression models.
The addition of regular term is an effective and widely used method to deal with high dimensional sparse data for a long time. The technique of adding regular terms can make the function converge faster. In addition, this technique can make the solution of high dimensional linear model easier. As the regular term, the multiregression model has good Oracle properties. There are many kinds of items, including LP, L1 and weighted L1 penalty. In this paper, we consider weighted L1 penalty.
In the medical field, the quantile regression is a powerful tool for the survival analysis. The deleted data is the data that the sample value can not be fully observed under certain settings. For example, when the sample value is higher or lower than a fixed (or random) value, we can only observe that fixed (or random) value. In the medical field, the censored data quantile regression model has replaced the Cox proportional hazard model and the accelerated failure time (AFT) model as the main method to study the survival analysis. In this paper, we consider the sparse high-dimensional censored quantile regression model with the regular term. Finally, it can be transformed into a linear combination of quantile regression models, and we can solve the quantile regression model of the censored data by solving the quantile regression model.
In this paper, we use smooth functions for the first time to solve the problem of censored quantile regression. First, we introduce the number of quantiles and censorship data as well as the relevant background knowledge and basic properties of high dimensional data in first, second and three chapters. Secondly, we enumerate two smooth functions, including the quantile Huber penalty function, to replace the quantiles. Since the Huber penalty function has the same optimal value as the quantile loss function, we use the Huber penalty function as the research object in the theoretical part of the article. We use the smooth function to make the objective function of the model a differentiable function. On the differentiable basis, on the basis of differentiable, we use the weighted L1 regular penalty term to design a weighted smooth iterative algorithm for the censored data quantile regression model, MIRL, to realize the variable selection problem in the censored quantile regression. So we can not only get the convergence of the algorithm, but also prove that the algorithm is convergent. The optimal solution of the model has a good statistical property, such as the asymptotic normal property, the Oracle property, and so on. In the numerical experiment part, we have done a full experiment - the random Gauss matrix experiment and the Toeplitz covariance matrix experiment. In the numerical experiment, the most obvious feature is that the FPR and the TPR are almost 0 and 1., respectively. It shows that our method can accurately select the effective variables, which shows that the model and algorithm have a good variable selection function. Not only the error of the experiment is very small, but also the good effect of variable selection is realized, which shows that our algorithm has a good effect.
【學(xué)位授予單位】:北京交通大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2015
【分類號】:O212.1
【共引文獻】
相關(guān)期刊論文 前10條
1 趙貽玖;王厚軍;戴志堅;;基于隱馬爾科夫樹模型的小波域壓縮采樣信號重構(gòu)方法[J];電子測量與儀器學(xué)報;2010年04期
2 焦李成;楊淑媛;劉芳;侯彪;;壓縮感知回顧與展望[J];電子學(xué)報;2011年07期
3 劉哲;楊揚;;一種新的基于壓縮感知理論的稀疏信號重構(gòu)算法[J];光電子.激光;2011年02期
4 何宜寶;畢篤彥;馬時平;魯磊;岳耀帥;;用概率推導(dǎo)和加權(quán)迭代L1范數(shù)實現(xiàn)信號重構(gòu)[J];光電子.激光;2012年03期
5 張曉偉;李明;左磊;;基于基追蹤-Moore-Penrose逆矩陣算法的稀疏信號重構(gòu)[J];電子與信息學(xué)報;2013年02期
6 劉福來;彭瀘;汪晉寬;杜瑞燕;;基于加權(quán)L_1范數(shù)的CS-DOA算法[J];東北大學(xué)學(xué)報(自然科學(xué)版);2013年05期
7 程曉良;鄭璇;韓渭敏;;求解欠定線性方程組稀疏解的算法[J];高校應(yīng)用數(shù)學(xué)學(xué)報A輯;2013年02期
8 譚龍;何改云;潘靜;龐彥偉;;基于近似零范數(shù)的稀疏核主成成分算法[J];電子測量技術(shù);2013年09期
9 傅緒加;吳紅光;;向量范數(shù)函數(shù)的單調(diào)遞減性質(zhì)[J];淮北師范大學(xué)學(xué)報(自然科學(xué)版);2013年04期
10 郝巖;許建樓;;迭代重加權(quán)的小波變分修復(fù)模型[J];電子與信息學(xué)報;2013年12期
相關(guān)博士學(xué)位論文 前10條
1 王樹云;基于Bayes方法和圖限制下正規(guī)化方法的變量選擇問題及其在基因組數(shù)據(jù)中的應(yīng)用[D];山東大學(xué);2010年
2 劉吉英;壓縮感知理論及在成像中的應(yīng)用[D];國防科學(xué)技術(shù)大學(xué);2010年
3 易學(xué)能;圖像的稀疏字典及其應(yīng)用[D];華中科技大學(xué);2011年
4 黃安民;基于感知字典的稀疏重建算法研究[D];電子科技大學(xué);2011年
5 王英楠;幾類非對稱矩陣錐分析[D];北京交通大學(xué);2011年
6 陳旭陽;主動式探測系統(tǒng)高質(zhì)量檢測、成像與識別方法研究[D];西安電子科技大學(xué);2011年
7 高磊;壓縮感知理論在寬帶成像雷達Chirp回波處理中的應(yīng)用研究[D];國防科學(xué)技術(shù)大學(xué);2011年
8 陳一平;圖像增強及其在視覺跟蹤中的應(yīng)用[D];國防科學(xué)技術(shù)大學(xué);2011年
9 谷小婧;基于圖像分析的自然彩色夜視成像方法研究[D];東華大學(xué);2011年
10 楊粵濤;基于非采樣Contourlet變換的圖像融合[D];中國科學(xué)院研究生院(長春光學(xué)精密機械與物理研究所);2012年
,本文編號:2107750
本文鏈接:http://sikaile.net/kejilunwen/yysx/2107750.html