混合潛變量模型的構(gòu)建及其在基因關(guān)聯(lián)分析中的應(yīng)用
發(fā)布時(shí)間:2018-07-07 10:40
本文選題:混合潛變量模型 + 單核苷酸基因多態(tài)性(SNPs); 參考:《山西醫(yī)科大學(xué)》2012年碩士論文
【摘要】:混合潛變量模型(Structural equation mixture modeling, SEMM)是一種用于處理同時(shí)包含分類潛變量和連續(xù)潛變量的數(shù)據(jù)而形成的理論體系。SEMM作為第二代結(jié)構(gòu)方程模型,它綜合了因子分析、潛在類別分析和潛在剖面分析的思想,形成了自身獨(dú)特的優(yōu)勢(shì),其目的是為潛變量的分析提供一種新的思路和方法。它的提出不僅彌補(bǔ)了結(jié)構(gòu)方程模型僅能處理連續(xù)潛變量和潛在類別分析僅能處理分類潛變量的不足,,也為醫(yī)學(xué)、社會(huì)、心理等領(lǐng)域的研究者面對(duì)復(fù)雜數(shù)據(jù)時(shí)提供了一種新的思路;旌蠞撟兞康倪@些優(yōu)點(diǎn)正是為了適應(yīng)現(xiàn)代醫(yī)學(xué)發(fā)展中不斷出現(xiàn)的復(fù)雜數(shù)據(jù)而出現(xiàn)的一種新的統(tǒng)計(jì)方法。因此,在醫(yī)學(xué)研究中引入SEMM具有重要的現(xiàn)實(shí)意義。 本文系統(tǒng)的介紹了混合潛變量模型的有關(guān)理論,包括子模型的相關(guān)理論知識(shí)以及混合潛變量模型的構(gòu)建、參數(shù)估計(jì)及模型的評(píng)價(jià)。模型參數(shù)估計(jì)介紹了常規(guī)的最大似然估計(jì)法(ML)和迭代最大似然估計(jì)(EM),其中EM算法是一種求解參數(shù)似然估計(jì)的迭代算法,是一種非常流行的極大似然估計(jì)方法,常用于處理存在缺失情況的數(shù)據(jù)。模型的評(píng)價(jià)指標(biāo)包括AIC(Akaike information criterion)評(píng)分、BIC(Bayesian information criterion)評(píng)分、CAIC(consistent Akaike information criterion)及ICL-BIC(integrated completed likelihoodcriterion with BIC)等。 在理論基礎(chǔ)之上,本文分別對(duì)因子分析混合模型和結(jié)構(gòu)方程混合模型兩類模擬數(shù)據(jù)進(jìn)行了分析說(shuō)明。實(shí)例部分采用混合潛變量模型對(duì)實(shí)測(cè)SNPs數(shù)據(jù)進(jìn)行了分析。實(shí)例數(shù)據(jù)是由GAW17提供的,包含697個(gè)個(gè)體的22條常染色體的上萬(wàn)個(gè)SNP和根據(jù)這些SNP所模擬的697個(gè)個(gè)體的性狀特點(diǎn)(3個(gè)定量性狀和1個(gè)定性性狀)。本研究挑選了1號(hào)染色體上的4個(gè)SNPs和3個(gè)定量性狀作為研究變量,分別進(jìn)行潛在類別分析和混合結(jié)構(gòu)方程模型分析。分析結(jié)果顯示:根據(jù)4個(gè)SNPS數(shù)據(jù),人群被分為3個(gè)潛在類別,各類別的概率分別為0.53,0.34,0.13。潛在類別1、2中Q的因子均值分別為-4.029和-2.052(潛在類別3的因子均值Q設(shè)為0)?芍獫撛陬悇e1、2因子均值均低于潛在類別3(P0.001)。 本文的討論部分對(duì)本次研究的意義做了簡(jiǎn)單說(shuō)明,并對(duì)結(jié)構(gòu)方程混合模型的模型構(gòu)建、參數(shù)估計(jì)、模型評(píng)價(jià)等各個(gè)環(huán)節(jié)進(jìn)行了探討,另外,還對(duì)本次研究的優(yōu)缺點(diǎn)及未來(lái)展望進(jìn)行了說(shuō)明。
[Abstract]:The structural equation mixture modeling, model (SEMM) is a theoretical system for processing data containing both classified and continuous latent variables. SEMM is used as the second generation structural equation model, which synthesizes factor analysis. The idea of potential category analysis and potential profile analysis has formed its own unique advantages, and its purpose is to provide a new way of thinking and method for the analysis of latent variables. It not only makes up for the deficiency that the structural equation model can only deal with continuous latent variables and potential category analysis can only deal with classified latent variables, but also makes up for the medical, social, Researchers in the field of psychology and other fields provide a new way of thinking when faced with complex data. These advantages of mixed latent variables are a new statistical method to adapt to the complex data emerging in the development of modern medicine. Therefore, the introduction of SEMM in medical research has important practical significance. In this paper, the theory of mixed latent variable model is introduced systematically, including the theoretical knowledge of submodel, the construction of mixed latent variable model, the estimation of parameters and the evaluation of model. The conventional maximum likelihood estimation (ML) and iterative maximum likelihood estimation (EM) are introduced in this paper. The EM algorithm is an iterative algorithm for solving the parameter likelihood estimation and is a very popular maximum likelihood estimation method. Often used to process missing data. The evaluation indexes of the model include BIC (Bayesian information criterion) score, CAIC (consistent Akaike information criterion) and ICL-BIC (integrated completed likelihoodcriterion with BIC). On the basis of the theory, two kinds of simulation data, factor analysis mixed model and structure equation mixed model, are analyzed and explained in this paper. The practical SNPs data are analyzed by the mixed latent variable model. The case data are provided by GAW17, including thousands of SNPs of 22 autosomes of 697 individuals and 697 traits (3 quantitative traits and 1 qualitative trait) of 697 individuals simulated by these SNPs. In this study, four SNPs and three quantitative traits on chromosome 1 were selected as study variables for potential class analysis and mixed structural equation model analysis, respectively. The results showed that according to the four SNPS data, the population was divided into three potential categories, and the probability of each category was 0.53 / 0.34 / 0.13 respectively. The factor mean values of Q were -4.029 and -2.052 in potential class 1 / 2, respectively (Q = 0 for potential category 3). The mean value of 1 / 2 factor of potential category was lower than that of potential category 3 (P0.001). In the discussion part of this paper, the significance of this study is briefly explained, and the model construction, parameter estimation and model evaluation of the mixed structural equation model are discussed. The advantages and disadvantages of this study and its future prospects are also described.
【學(xué)位授予單位】:山西醫(yī)科大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類號(hào)】:R346
【參考文獻(xiàn)】
相關(guān)期刊論文 前2條
1 王穎,褚迅,黃薇;單核苷酸多態(tài)性研究及其對(duì)人類醫(yī)學(xué)的影響[J];基礎(chǔ)醫(yī)學(xué)與臨床;2004年06期
2 朱文圣;郭建華;;基于單倍型的復(fù)雜疾病基因定位研究[J];數(shù)理統(tǒng)計(jì)與管理;2009年02期
相關(guān)碩士學(xué)位論文 前3條
1 張韶凱;基于貝葉斯網(wǎng)的潛類分析在基因關(guān)聯(lián)分析中的應(yīng)用[D];山西醫(yī)科大學(xué);2011年
2 連軍艷;EM算法及其改進(jìn)在混合模型參數(shù)估計(jì)中的應(yīng)用研究[D];長(zhǎng)安大學(xué);2006年
3 裴磊磊;抑郁患者單核苷酸多態(tài)性(SNPs)分布特征的潛在類別分析[D];山西醫(yī)科大學(xué);2009年
本文編號(hào):2104713
本文鏈接:http://sikaile.net/xiyixuelunwen/2104713.html
最近更新
教材專著