天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于懲罰回歸的縱向數(shù)據(jù)罕見變異關(guān)聯(lián)分析

發(fā)布時間:2018-04-27 03:43

  本文選題:縱向數(shù)據(jù) + 罕見變異關(guān)聯(lián)分析; 參考:《山西醫(yī)科大學(xué)》2017年博士論文


【摘要】:目的:縱向二代測序數(shù)據(jù)相比于橫斷面數(shù)據(jù),可以研究復(fù)雜性狀隨時間的變化關(guān)系、遺傳位點對復(fù)雜疾病的動態(tài)效應(yīng),從而提高遺傳變異對復(fù)雜疾病的解釋程度。由于罕見變異的發(fā)生率極低,全基因組關(guān)聯(lián)研究(genome-wide association studies,GWAS)常用的基于單個位點的分析用于罕見變異分析時,統(tǒng)計效能過低。現(xiàn)有的罕見變異分析大多數(shù)以基因為單位,研究一組罕見變異的遺傳效應(yīng)。有關(guān)縱向數(shù)據(jù)罕見變異關(guān)聯(lián)分析的方法剛剛起步,由于縱向二代測序數(shù)據(jù)有限的樣本量和不可避免的數(shù)據(jù)缺失,現(xiàn)有的廣義估計方程(generalized estimating equations,GEE)和線性混合效應(yīng)模型(linear mixed model,LMM)框架下的罕見變異關(guān)聯(lián)分析面臨計算的挑戰(zhàn)。因此,針對縱向二代測序數(shù)據(jù),迫切需要發(fā)展高效且計算上可行的關(guān)聯(lián)分析方法,以克服現(xiàn)有方法的不足,篩選出對人類復(fù)雜疾病有重要影響的遺傳變異位點或基因,為人類復(fù)雜疾病相關(guān)基因的識別,提供方法學(xué)支撐,為精準(zhǔn)醫(yī)學(xué)發(fā)展和新靶點的發(fā)現(xiàn)和挖掘提供證據(jù)。方法:本文提出基于懲罰GEE(pGEE)和懲罰二次推斷函數(shù)(penalized quadratic inference function,pQIF)的縱向數(shù)據(jù)罕見變異關(guān)聯(lián)分析方法。在pGEE和pQIF框架下,借用加權(quán)合計檢驗(Weighted Sum Statistic,WSS)以及遺傳風(fēng)險得分的思想,以基因為單位,對基因內(nèi)所有常見變異和罕見變異加權(quán)求和,得到新的基因得分變量,將基因得分變量引入到pGEE和pQIF中進(jìn)行分析,研究基因得分與疾病之間的關(guān)系,從而篩選出復(fù)雜疾病相關(guān)基因。利用GAW18真實遺傳數(shù)據(jù),模擬產(chǎn)生連續(xù)和二分類的血壓表型數(shù)據(jù),綜合評價pGEE和pQIF縱向數(shù)據(jù)罕見變異關(guān)聯(lián)分析在不同模型條件下的參數(shù)估計和基因篩選情況,同時,探討pGEE和pQIF縱向數(shù)據(jù)罕見變異關(guān)聯(lián)分析在不同作業(yè)相關(guān)矩陣下基因篩選的穩(wěn)健性和一致性。最后,基于通路進(jìn)行GAW18真實數(shù)據(jù)分析,選取高血壓相關(guān)的兩條重要通路,腎素-血管緊張素系統(tǒng)(Renin-angiotensin system,RAS)和Ca2+/AT-IIR/a-AR信號通路,以識別出高血壓相關(guān)基因。結(jié)果:懲罰GEE和懲罰QIF的參數(shù)估計精度遠(yuǎn)優(yōu)于未懲罰的GEE和QIF,隨著樣本量的增大,懲罰模型的參數(shù)估計精度接近于oracle模型,oracle模型即為僅包含效應(yīng)為非零系數(shù)變量的真實模型;連續(xù)應(yīng)變量的pGEE和pQIF縱向數(shù)據(jù)罕見變異關(guān)聯(lián)分析的參數(shù)估計和變量選擇結(jié)果略優(yōu)于二分類應(yīng)變量結(jié)果,體現(xiàn)出二分類模型的復(fù)雜性。pQIF的錯誤選擇率極低,且在不同作業(yè)相關(guān)矩陣設(shè)置下參數(shù)估計具有穩(wěn)健性和一致性,優(yōu)于pGEE。然而,在樣本量較小且維度較高時,pQIF無法正確選擇效應(yīng)基因;而pGEE對高維度且小樣本情況,仍能以較高的正確選擇率篩選出效應(yīng)基因。因此,在縱向數(shù)據(jù)罕見變異關(guān)聯(lián)分析中,當(dāng)樣本量較小且維度較低時,采用pQIF以避免錯誤選擇;當(dāng)樣本量小且高維度時,采用pGEE方法。在Ca2+/AT-IIR/a-AR信號通路中,pGEE和pQIF共同識別出基因AGTR1;在RAS系統(tǒng)通路中,pGEE識別出THOP1和PRCP基因,pQIF識別出THOP1基因和ACE基因。結(jié)論:針對縱向二代測序數(shù)據(jù)分析,構(gòu)建了pGEE和pQIF縱向數(shù)據(jù)罕見變異關(guān)聯(lián)分析方法,兩種方法互為補充,能應(yīng)用于自變量個數(shù)隨樣本量增大而增大的情況,有效地識別出復(fù)雜疾病相關(guān)基因。隨著縱向二代測序數(shù)據(jù)的日益增多,pGEE和pQIF縱向數(shù)據(jù)罕見變異關(guān)聯(lián)分析的應(yīng)用將更為廣泛。
[Abstract]:Objective: compared with cross section data, longitudinal two generation sequencing data can study the variation of complex traits with time, the dynamic effects of genetic sites on complex diseases, and thus improve the interpretation of genetic variation for complex diseases. The total genome association study (genome-wide association studies, GW) AS) commonly used analysis based on single loci used for rare variation analysis, the statistical effectiveness is too low. Most of the existing rare variation analyses are based on the basis of units to study the genetic effects of a rare group of variations. The method for the analysis of rare variations in longitudinal data is just starting, due to the limited sample size and lack of longitudinal two generation sequencing data. Avoidable data loss, rare variation association analysis under the existing generalized estimating equations (GEE) and linear mixed effect model (linear mixed model, LMM) framework is faced with the challenge of computing. Therefore, for longitudinal two generation sequencing data, it is urgent to develop efficient and computationally feasible association analysis parties. In order to overcome the shortcomings of the existing methods, the genetic variation loci or genes that have important effects on human complex diseases are screened out, which provide a methodological support for the identification of related genes related to human complex diseases, and provide evidence for the development of precision medicine and the discovery and mining of new targets. In this paper, two times of punishment based on penalty GEE (pGEE) and punishment are proposed. Penalized quadratic inference function (pQIF), a rare variant correlation analysis method for longitudinal data. In the framework of pGEE and pQIF, the idea of using weighted aggregate test (Weighted Sum Statistic, WSS) and genetic risk scores, based on a unit, to obtain a new weighted sum for all the common variations and rare variations within the group. Gene scoring variables were introduced into pGEE and pQIF to analyze the relationship between gene score and disease, to screen out complex disease related genes. Using GAW18 real genetic data to simulate continuous and two classification of blood pressure phenotypic data, combined to evaluate the rare variation Association of pGEE and pQIF longitudinal data. The parameter estimation and gene screening under different model conditions were analyzed. Meanwhile, the robustness and consistency of gene screening under the different job correlation matrix of pGEE and pQIF longitudinal data was discussed. Finally, based on the GAW18 real data analysis, two important pathways related to hypertension were selected and renin blood vessel tightened. The Zhang Su system (Renin-angiotensin system, RAS) and Ca2+/AT-IIR/a-AR signaling pathways are used to identify the hypertension related genes. Results: the precision of the parameter estimation of the penalty GEE and the penalty QIF is far superior to the penalized GEE and QIF. With the increase of the sample size, the precision of the parameter estimation of the penalty model is close to the oracle model, and the oracle model is only included in the effect. It should be a real model of non zero coefficient variables; the parameter estimation and variable selection results of pGEE and pQIF longitudinal data correlation analysis are slightly better than the results of two classified variables. It shows that the error selection rate of the complexity of the two classification model is very low, and the parameter estimation under the setting of different job correlation matrices has the parameters. Robustness and consistency are superior to pGEE., however, when the sample size is small and the dimension is high, pQIF can not select the effect gene correctly; while pGEE is still able to screen the effect genes with higher correct selection rate for the high dimension and small sample. Therefore, in the rare variation correlation analysis of the longitudinal data, when the sample size is small and the dimension is low, the P is used. QIF to avoid error selection; when the sample size is small and high dimension, the pGEE method is used. In the Ca2+/AT-IIR/a-AR signaling pathway, the gene AGTR1 is identified jointly by pGEE and pQIF. In the RAS system, pGEE recognizes the THOP1 and PRCP genes, and pQIF identifies the THOP1 genes and genes. IF longitudinal data rare variation association analysis method, the two methods are complementary to each other, can be applied to the number of independent variables increasing with the increase of sample size, effectively identifying the related genes of complex diseases. With the increasing number of longitudinal two generation sequencing data, the application of pGEE and pQIF longitudinal data rare variation association analysis will be more extensive.

【學(xué)位授予單位】:山西醫(yī)科大學(xué)
【學(xué)位級別】:博士
【學(xué)位授予年份】:2017
【分類號】:O212.1

【相似文獻(xiàn)】

相關(guān)期刊論文 前10條

1 趙曉兵;唐玉萍;;含持久生存數(shù)據(jù)的一個縱向數(shù)據(jù)模型[J];四川文理學(xué)院學(xué)報;2007年02期

2 謝婧;孫海燕;汪l,

本文編號:1809052


資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/shoufeilunwen/jckxbs/1809052.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶e7ae4***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com