基于彈性網(wǎng)技術(shù)下的加速失效時(shí)間模型的規(guī)范化估計(jì)
發(fā)布時(shí)間:2018-03-19 03:20
本文選題:加速時(shí)效時(shí)間模型 切入點(diǎn):彈性網(wǎng) 出處:《西南交通大學(xué)》2016年碩士論文 論文類型:學(xué)位論文
【摘要】:對(duì)高維度基因數(shù)據(jù)研究的一個(gè)重要目標(biāo)就是識(shí)別和疾病的發(fā)生和發(fā)展有關(guān)的基因標(biāo)記,其中十分有代表性的例子是微陣列數(shù)據(jù)的預(yù)后分析。從微陣列基因表達(dá)數(shù)據(jù)中搜尋顯著相關(guān)的生物標(biāo)記是十分困難的。由于基因表達(dá)數(shù)據(jù)的高維度性質(zhì)使得標(biāo)準(zhǔn)的生存分析技術(shù)無法直接應(yīng)用其中,而且在被研究的數(shù)以千計(jì)的基因中,只有很小的一部分基因是與疾病有關(guān)的。當(dāng)研究的對(duì)象為時(shí)間數(shù)據(jù)時(shí),往往由于刪失情況的存在而無法得到準(zhǔn)確的數(shù)據(jù),因而篩選相關(guān)的基因變得十分具有挑戰(zhàn)性。我們提出利用彈性網(wǎng)懲罰規(guī)范化加速失效模型的Gehan估計(jì)方法,從而篩選出對(duì)生存時(shí)間有重要影響的基因數(shù)據(jù),采用和LASSO相似的算法得到估計(jì)值,并且證明了估計(jì)值的性質(zhì)。和已經(jīng)存在的基于逆概率加權(quán)和Buckley and James估計(jì)不同,本文所提出的方法不要求對(duì)刪失數(shù)據(jù)的額外假設(shè),使得本方法更加具有普遍適用性。在本文中我們做了大量數(shù)字模擬,其中部分模擬采用Cai,T.于2009年發(fā)表的文章中對(duì)模擬研究的設(shè)置,從而對(duì)所提方法在有限樣本上進(jìn)行了驗(yàn)證。通過和Cai,T.的方法進(jìn)行對(duì)比可以發(fā)現(xiàn)本文方法在篩選變量的能力上有所提高,并且能夠處理變量個(gè)數(shù)大于樣本觀測值的情況,這是Cai,T.的方法所無法解決的。但是本文方法也存在著一定的缺陷,如在協(xié)變量間相關(guān)系數(shù)較大時(shí)均方誤差和Cai,T.相比較大等。最后我們將所提方法用于Beer, D文章中的肺腺癌實(shí)驗(yàn)研究數(shù)據(jù),篩選出與肺腺癌有關(guān)聯(lián)的基因數(shù)據(jù)。在最終篩選出的數(shù)據(jù)中我們選出了Beer, D文章所沒有找出的基因,并且通過t檢驗(yàn)表明這些基因?qū)Σ∪耸欠窕疾∮酗@著影響,當(dāng)然所選基因是否與疾病真正相關(guān)仍需要后續(xù)臨床研究的證明。
[Abstract]:An important goal in the study of high-dimensional genetic data is to identify genetic markers associated with the occurrence and development of diseases. One of the most representative examples is the prognostic analysis of microarray data. It is very difficult to search for significant related biomarkers from microarray gene expression data. The high dimensional nature of gene expression data makes the standard. Can't be directly applied to the survival analysis technology based on. And of the thousands of genes that have been studied, only a small fraction of them are linked to disease. When the subject of the study is time data, it is often impossible to obtain accurate data due to the presence of deletions. Therefore, it is very challenging to screen related genes. We propose a Gehan estimation method using elastic network to punish normalized accelerated failure models, and then we can screen out gene data that have an important impact on survival time. The estimated value is obtained by using an algorithm similar to that of LASSO, and the properties of the estimated value are proved. Unlike the existing inverse probabilistic weighting and Buckley and James estimation, the method proposed in this paper does not require additional assumptions for censored data. In this paper, we have done a lot of digital simulation, some of which are based on Caian T.The article published in 2009, By comparing the method with CaiT. it is found that the ability of this method to screen variables is improved, and the number of variables is larger than the observed values of the sample, and the proposed method can be used to deal with the case that the number of variables is larger than the observed value of the sample. This method cannot be solved by Caian T.'s method, but the method in this paper also has some defects. For example, when the correlation coefficient between covariables is large, the mean square error is larger than that of Caian T.Finally, we apply the proposed method to the experimental study of lung adenocarcinoma in Beer, D article. We screened out the genetic data associated with lung adenocarcinoma. In the final data we selected the genes that Beer, D did not find, and t-test showed that these genes had a significant effect on the patient's disease. Of course, further clinical studies are needed to prove whether the selected gene is truly related to the disease.
【學(xué)位授予單位】:西南交通大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2016
【分類號(hào)】:C81
【相似文獻(xiàn)】
相關(guān)期刊論文 前1條
1 ;特別聚焦[J];NBA特刊;2007年08期
相關(guān)碩士學(xué)位論文 前1條
1 王愷樂;基于彈性網(wǎng)技術(shù)下的加速失效時(shí)間模型的規(guī)范化估計(jì)[D];西南交通大學(xué);2016年
,本文編號(hào):1632603
本文鏈接:http://sikaile.net/shekelunwen/shgj/1632603.html
最近更新
教材專著