天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 數(shù)學(xué)論文 >

基于Lasso的高維數(shù)據(jù)線性回歸模型統(tǒng)計(jì)推斷方法比較

發(fā)布時(shí)間:2018-07-27 15:23
【摘要】:目的:本文將介紹五種基于Lasso的高維數(shù)據(jù)線性回歸模型統(tǒng)計(jì)推斷方法:Lasso-懲罰計(jì)分檢驗(yàn)(Lasso Penalized Score Test,Lassoscore),多重樣本拆分(Multiple Sample-Splitting,MS-split)、穩(wěn)定選擇(Stability Selection)、低維投射(Low-Dimensional Projection Estimate,LDPE)、協(xié)方差檢驗(yàn)(Covariance test,Covtest),并將這五種方法作比較,分析其在不同高維數(shù)據(jù)情形下的表現(xiàn)。方法:分別介紹Lasso-懲罰計(jì)分檢驗(yàn)、多重樣本拆分、穩(wěn)定選擇、低維投射、協(xié)方差檢驗(yàn)的基本原理。利用以下四個(gè)參數(shù)設(shè)置模擬數(shù)據(jù),分別為:7種樣本量n=50、75、100、150、200、300、400;兩種自變量個(gè)數(shù)p=100、300;兩種自變量間相關(guān)性,一是自變量間相互獨(dú)立,二是自變量間相關(guān)性為corr(Xi,Xj)=0.5|i-j|;兩種回歸系數(shù)大小,一是β1=β2=β3=β4=β5=5,βj=0,j5。二是β1=β2=β3=β4=β5=0.15,βj=0,j5。以上四個(gè)參數(shù)分別構(gòu)成不同情形的高維數(shù)據(jù)。采用R軟件模擬數(shù)據(jù)并用五種方法做統(tǒng)計(jì)推斷,最后以期望假陽性率(Expected False Positives,EFP)和檢驗(yàn)效能(power)為評(píng)價(jià)指標(biāo),比較這五種方法在不同高維數(shù)據(jù)情形下的表現(xiàn)。結(jié)果:在理想高維數(shù)據(jù)情形下五種方法除協(xié)方差檢驗(yàn)推斷結(jié)果保守外其余方法表現(xiàn)都較好,其中穩(wěn)定選擇的EFP最低而檢驗(yàn)效能最高,在五種方法中表現(xiàn)最好。低維投射、穩(wěn)定選擇、多重樣本拆分都對(duì)βmin條件有要求,其中穩(wěn)定選擇過于其依賴βmin條件,所以在復(fù)雜高維數(shù)據(jù)情形下檢驗(yàn)效能大幅度降低,表現(xiàn)差。在復(fù)雜高維數(shù)據(jù)情形下低維投射在大樣本和小樣本下表現(xiàn)都較保守,雖然在中等樣本量時(shí)檢驗(yàn)效能很高,但是以引入極高的假陽性為代價(jià)的。無論在何種數(shù)據(jù)情形下協(xié)方差檢驗(yàn)推斷結(jié)果都很保守。在復(fù)雜高維數(shù)據(jù)情形下Lasso-懲罰計(jì)分檢驗(yàn)的檢驗(yàn)效能是五種方法中最高的,其次為多重樣本拆分,而Lasso-懲罰計(jì)分檢驗(yàn)的EFP也是最高的,多重樣本拆分的EFP基本接近0。結(jié)論:在常見復(fù)雜高維數(shù)據(jù)情形下Lasso-懲罰計(jì)分檢驗(yàn)發(fā)現(xiàn)真實(shí)非零變量的能力優(yōu)于其余四種方法,且其對(duì)βmin的要求低,但期望假陽性率高。多重樣本拆分的發(fā)現(xiàn)真實(shí)非零變量的能力雖然依賴于數(shù)據(jù)對(duì)βmin條件的滿足與否,但當(dāng)條件不滿足時(shí)僅次于Lasso-懲罰計(jì)分檢驗(yàn),且其期望假陽性率極低。所以在常見復(fù)雜高維數(shù)據(jù)中Lasso-懲罰計(jì)分檢驗(yàn)和多重樣本拆分是兩種較好的高維線性回歸模型統(tǒng)計(jì)推斷方法,兩者相對(duì)而言前者較寬松,后者較保守。在實(shí)際應(yīng)用中雖然無法得知真實(shí)數(shù)據(jù)是否滿足βmin條件,但可根據(jù)應(yīng)用需求來選擇合適的統(tǒng)計(jì)推斷方法。
[Abstract]:Objective: this paper will introduce five statistical inference methods of high-dimensional data linear regression model based on Lasso: Lasso-penalty score test (Lasso Penalized Score Test-Lassoscore), Multiple Sample-Spliting (MS-split), stable selection of (Stability Selection), low-dimensional projection (LDPE), Covariance test Cov test, and covariance test. Compare these five methods, Its performance under different high dimensional data is analyzed. Methods: the basic principles of Lasso-penalty score test, multiple sample splitting, stable selection, low dimensional projection and covariance test were introduced respectively. Using the following four parameters to set up the simulation data, the following four parameters are used to set up the simulation data, respectively, that is, the sample size of 7 kinds of samples n / 7 / 100150200300400; the number of two independent variables p / 100300; the correlation between the two independent variables, one is the independence of the independent variables, the other is the correlation between the independent variables is corr (Xianxj) 0.5i-j, and the two regression coefficients are 尾 _ 1 = 尾 _ 2 = 尾 _ 2 = 尾 _ 4 = 尾 _ 5J _ 5, 尾 _ j _ 0J _ 5. the two kinds of regression coefficients are: 尾 _ 1 = 尾 _ 2 = 尾 _ 2 = 尾 _ 4 = 尾 _ 5 ~ (5). The other is 尾 _ 1 = 尾 _ 2 = 尾 _ 3 = 尾 _ 4 = 尾 _ 5N _ (0.15), 尾 _ (JJ) _ (0) J _ (5). The above four parameters constitute high dimensional data in different cases. The R software was used to simulate the data and five methods were used to make statistical inference. Finally, the expected false positive rate (Expected False positive rate) and the test effectiveness (power) were used as evaluation indexes to compare the performance of the five methods in different high-dimensional data cases. Results: in the case of ideal high-dimensional data, all the five methods performed well except covariance test inference results. Among them, the stable selection of EFP was the lowest and the test efficiency was the highest, and the five methods performed best. In the case of low dimensional projection, stable selection and multiple sample splitting, the 尾 min condition is required, and the stable selection is too dependent on the 尾 min condition, so the test efficiency is greatly reduced and the performance is poor in the case of complex high dimensional data. In the case of complex high-dimensional data, the low-dimensional projection is conservative in both large and small samples. Although the test efficiency is very high in the case of medium sample size, it is at the cost of introducing extremely high false positives. Covariance test inferences are conservative regardless of the data. In the case of complex high-dimensional data, the test efficiency of Lasso-penalty score test is the highest among the five methods, followed by multi-sample splitting, while the EFP of Lasso-penalty score test is the highest, and the EFP of multi-sample splitting is close to zero. Conclusion: Lasso-penalty score test shows that the ability of real non-zero variables is superior to the other four methods in the case of complex high-dimensional data, and its demand for 尾 min is low, but the expected false positive rate is high. The ability of multi-sample split to find real non-zero variables depends on whether the data satisfies the 尾 min condition, but when the condition is not satisfied, it is second only to Lasso-penalty score test, and its expected false positive rate is very low. Therefore, Lasso-penalty score test and multi-sample splitting are two better statistical inference methods for high-dimensional linear regression model in common complex high-dimensional data. The former is relatively loose and the latter is more conservative. Although it is impossible to know whether the real data satisfies the 尾 min condition in practical application, we can select a suitable statistical inference method according to the application requirements.
【學(xué)位授予單位】:山西醫(yī)科大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2015
【分類號(hào)】:R195.1

【引證文獻(xiàn)】

相關(guān)會(huì)議論文 前1條

1 閆麗娜;王彤;;懲罰COX模型和彈性網(wǎng)技術(shù)在高維數(shù)據(jù)生存分析中的應(yīng)用[A];2011年中國衛(wèi)生統(tǒng)計(jì)學(xué)年會(huì)會(huì)議論文集[C];2011年

,

本文編號(hào):2148260

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/yysx/2148260.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶72a1f***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請E-mail郵箱bigeng88@qq.com
久久精品中文字幕人妻中文| 欧美多人疯狂性战派对| 欧美日韩国产亚洲三级理论片| 尹人大香蕉中文在线播放| 欧美国产日本高清在线| 亚洲天堂精品在线视频| 亚洲成人精品免费在线观看| 国产一区二区在线免费| 亚洲视频在线观看你懂的| 日韩在线中文字幕不卡| 日韩中文高清在线专区| 日韩人妻少妇一区二区| 日本女优一区二区三区免费| 久久精品国产亚洲av麻豆| 老司机精品国产在线视频| 国语久精品在视频在线观看 | 欧美成人久久久免费播放| 午夜精品在线观看视频午夜| 91精品日本在线视频| 久久女同精品一区二区| 国产自拍欧美日韩在线观看| 国内外免费在线激情视频| 久久热在线视频免费观看| 麻豆视频传媒入口在线看| 免费在线观看欧美喷水黄片| 国产不卡最新在线视频| 国产户外勾引精品露出一区| 国产又粗又猛又爽又黄| 国产伦精品一区二区三区精品视频| 丝袜破了有美女肉体免费观看| 久草国产精品一区二区| 欧美高潮喷吹一区二区| 亚洲综合香蕉在线视频| 国产麻豆精品福利在线| 国内外激情免费在线视频| 久久热在线免费视频精品| 亚洲中文字幕综合网在线| 日本女优一色一伦一区二区三区| 欧美日韩精品人妻二区三区| 好吊色免费在线观看视频| 午夜精品一区二区三区国产|