高維數(shù)據(jù)下半?yún)?shù)可加危險率模型中基于ISIS的變量選擇方法及其應(yīng)用
發(fā)布時間:2018-04-29 00:35
本文選題:高維數(shù)據(jù) + AHAZISIS模型; 參考:《重慶醫(yī)科大學》2017年碩士論文
【摘要】:目的本文主要介紹高維數(shù)據(jù)下半?yún)?shù)可加危險率模型中基于ISIS的變量選擇方法,并探討AHAZISIS模型,AHAZLASSOISIS模型,AHAZENISIS模型,AHAZSSCADISIS模型在高維數(shù)據(jù)生存分析中的優(yōu)劣。從而揭示死亡或其他生存結(jié)局發(fā)生的時間與基因表達之間的關(guān)系,從基因?qū)用嫔蠟榧膊〉脑\療和預(yù)后以及改進治療方案提供依據(jù)。方法介紹AHAZISIS模型,AHAZLASSOISIS模型,AHAZENISIS模型和AHAZSSCADISIS模型的基本方法原理。針對生物信息學高維度,強相關(guān),小樣本量的數(shù)據(jù)特征進行數(shù)據(jù)模擬,并比較四種模型在不同模擬數(shù)據(jù)下的表現(xiàn)情況。最后利用來源于TCGA的前列腺癌數(shù)據(jù)進行實證研究。結(jié)果(1)各種模擬數(shù)據(jù)情形下,三種初次懲罰函數(shù)的模擬結(jié)果在一致性和精確性的表現(xiàn)上差別不大。(2)各種數(shù)據(jù)情形下,四種再次懲罰函數(shù)在一致性方面OS-SCAD表現(xiàn)最好,SSCAD次之,Lasso第三,EN表現(xiàn)最差;而在精確性方面,OS-SCAD和SSCAD較好,Lasso次之,EN表現(xiàn)最差。(3)各種數(shù)據(jù)情形下,再次懲罰函數(shù)SSCAD的不同steps在一致性方面,steps=1表現(xiàn)最好,steps=2,3,4,5比較接近;在精確性方面,steps=1表現(xiàn)最差,steps=2,3,4,5比較接近。(4)三種初次懲罰函數(shù),四種再次函數(shù)以及再次懲罰函數(shù)SSCAD的不同steps在精確性方面與協(xié)變量相關(guān)系數(shù)大小呈負相關(guān),即相關(guān)系數(shù)較小則精確性高,反之精確性則低。(5)AHAZISIS模型、AHAZSSCADISIS模型在實證研究中篩選出基因數(shù)目少,模型可解釋性較好。根據(jù)log-rank檢驗的p值大小,AHAZISIS模型、AHAZSSCADISIS模型在實證研究中預(yù)測能力方面表現(xiàn)較好。結(jié)論在模擬研究和實證研究中,各模型表現(xiàn)一致。AHAZISIS模型和AHAZSSCADISIS模型的模型解釋性較好,估計精確性也較高,是處理高維度、強相關(guān)、小樣本量的數(shù)據(jù)比較可靠的模型。而AHAZLASSOISIS模型和AHAZENISIS模型在處理高維度、強相關(guān)、小樣本量的數(shù)據(jù)時表現(xiàn)較差,尤其是AHAZENISIS模型可解釋性最差且估計精確性也最差。
[Abstract]:Objective this paper mainly introduces the variable selection method based on ISIS in the semi-parametric additive risk rate model of high dimensional data, and discusses the advantages and disadvantages of AHAZISIS model AHAZLASSOISIS model and AHAZS SCADISIS model in high-dimensional data survival analysis. Thus, the relationship between the time of death or other survival outcome and gene expression is revealed, and the basis for diagnosis, treatment and prognosis of disease and improvement of treatment plan are provided from the gene level. Methods the basic principles of AHAZISIS model AHAZLASSOISIS model and AHAZENIS model and AHAZSSCADISIS model were introduced. The data characteristics of high dimension, strong correlation and small sample size of bioinformatics were simulated, and the performance of the four models under different simulated data was compared. Finally, an empirical study was conducted using prostate cancer data from TCGA. Results 1) under all kinds of simulated data, the simulation results of the three primary penalty functions have little difference in the performance of consistency and accuracy. The four repenalty functions performed best in consistency terms, OS-SCAD performed the second best, Lasso third and en performed the worst, while in accuracy, OS-SCAD and SSCAD showed the worst performance of en.) in all kinds of data cases, The different steps of the repenalty function SSCAD have the best consistency in terms of consistency, and the three primary penalty functions are similar to each other in terms of accuracy. The different steps of four reorder functions and the repenalty function SSCAD have negative correlation with the correlation coefficient of covariable in terms of accuracy, that is, the smaller the correlation coefficient, the higher the accuracy. On the other hand, the accuracy of AHAZIS model is lower than that of AHAZIS model / AHAZSSCADISIS model, and the number of genes screened out by AHAZSSCADISIS model is less than that of AHAZIS model, and the model can be interpreted well. According to the p value of log-rank test, AHAZIS model and AHAZS SCADISIS model are good in forecasting ability in empirical research. Conclusion in both simulation and empirical studies, the models of AHAZIS and AHAZSSCADISIS are consistent. AHAZIS model and AHAZSSCADISIS model have better explanatory and accurate estimation, which is a reliable model for dealing with high dimensional, strong correlation and small sample size data. However, AHAZLASSOISIS model and AHAZENISIS model have poor performance in dealing with high dimension, strong correlation, small sample size data, especially AHAZENISIS model is the worst interpretable and estimation accuracy is the worst.
【學位授予單位】:重慶醫(yī)科大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:O212;R195.1
【參考文獻】
相關(guān)碩士學位論文 前2條
1 王慧;生存分析中半?yún)?shù)模型的變量選擇方法及其模擬研究[D];山西醫(yī)科大學;2013年
2 張秀秀;基于(I)SIS的變量選擇方法及其在極高維數(shù)據(jù)生存分析中的應(yīng)用[D];山西醫(yī)科大學;2013年
,本文編號:1817609
本文鏈接:http://sikaile.net/kejilunwen/yysx/1817609.html
最近更新
教材專著