基于單倍型的關(guān)聯(lián)分析方法
發(fā)布時(shí)間:2018-01-25 05:17
本文關(guān)鍵詞: 單倍型關(guān)聯(lián)分析 logistic回歸 單倍型聚類 病例-對(duì)照研究 U-統(tǒng)計(jì)量 熵 出處:《東北師范大學(xué)》2011年博士論文 論文類型:學(xué)位論文
【摘要】:人類基因組計(jì)劃的完成,不論從數(shù)量上還是從質(zhì)量上,都極大地豐富了人類遺傳的數(shù)據(jù)資源,但也容易使人迷失在這浩如煙海的信息中。統(tǒng)計(jì)學(xué),作為一種強(qiáng)有力的數(shù)據(jù)分析工具,越來(lái)越受到人們的重視并在遺傳流行病的研究中發(fā)揮著不可替代的作用。 關(guān)聯(lián)分析主要通過(guò)研究遺傳標(biāo)記物與可觀測(cè)的性狀之間的統(tǒng)計(jì)相關(guān)性,來(lái)尋找和定位致病基因,并為我們更好的地理解疾病遺傳基礎(chǔ)發(fā)揮了重要的作用。單倍型,作為一種常見(jiàn)的數(shù)據(jù)類型,被人們認(rèn)為含有更多的連鎖不平衡(LD)信息,而且與其他方法相比,基于單倍型的關(guān)聯(lián)分析在識(shí)別疾病關(guān)聯(lián)上有更大的功效,尤其是病例—對(duì)照研究中稀有疾病的情況。但是,對(duì)這些單倍型進(jìn)行建模,其中的稀有單倍型會(huì)帶來(lái)很多的統(tǒng)計(jì)問(wèn)題——大量的參數(shù)會(huì)使功效減少、效率降低。為了克服這些問(wèn)題,單倍型聚類是個(gè)不錯(cuò)的解決方式。本文著重介紹了在基于單倍型的關(guān)聯(lián)分析中,如何有效地利用位點(diǎn)本身以及位點(diǎn)間的信息來(lái)提高檢驗(yàn)的功效,其中包括一個(gè)參數(shù)方法和一個(gè)非參數(shù)方法。 本文首先介紹了基于單倍型聚類來(lái)進(jìn)行關(guān)聯(lián)分析的方法,稱之為APEG,通過(guò)使用EG距離應(yīng)用AP算法對(duì)單倍型進(jìn)行有效合理的聚類。新提出的針對(duì)單倍型這一特殊數(shù)據(jù)類型的相似性度量EG距離,能夠利用不同位點(diǎn)上以及位點(diǎn)之間的結(jié)構(gòu)信息。通過(guò)模擬和真實(shí)數(shù)據(jù)的研究發(fā)現(xiàn),APEG方法要比現(xiàn)存的其他方法在探測(cè)單倍型與疾病之間是否相關(guān)聯(lián)方面擁有更大的功效,而且在基因定位上,也能夠得到比較精確的估計(jì)。然后,我們介紹了基于U—統(tǒng)計(jì)量的非參數(shù)方法U-EGS,其優(yōu)點(diǎn)是漸進(jìn)正態(tài)性,而且不需要對(duì)樣本總體的分布進(jìn)行假設(shè)。U-EGS中引入的新的核函數(shù)EGS,是EG距離的一種推廣,同樣也能利用位點(diǎn)的信息。隨后的模擬研究也證實(shí)了,在不同的參數(shù)下,對(duì)不同的疾病模型,使用能夠融入位點(diǎn)信息的核函數(shù)EGS的U—統(tǒng)計(jì)量要比沒(méi)有利用位點(diǎn)信息的U—統(tǒng)計(jì)量在統(tǒng)計(jì)功效上擁有更大的優(yōu)勢(shì)。
[Abstract]:The completion of human genome project, both in terms of quantity and quality, have greatly enriched the human genetic data resources, but also easy to make people lost in the multitude of information. In statistics, as a powerful tool for data analysis, more and more people's attention and play an irreplaceable role in the study of genetic epidemiology.
The correlation between the statistical correlation analysis mainly through the study of genetic markers and observable traits, to find and locate genes, and for our better understanding of the genetic basis of disease has played an important role. The haplotype, as a common type of data, by people that contain more linkage disequilibrium (LD) information, and compared with other methods, the haplotype association analysis is more effective in identifying disease association based on, especially a case-control study in rare diseases. However, the modeling of these haplotypes, which caused by rare haplotypes statistics -- a large number of parameters will make much effect reduction efficiency reduced. In order to overcome these problems, the haplotype clustering is a good way to solve. This article focuses on the haplotype based association studies, how to effectively use the site The information between the loci and itself improves the effectiveness of the test, including a parameter method and a non parametric method.
This paper first introduces the method of correlation analysis based on haplotype clustering, called APEG, are effective and reasonable for haplotype clustering by using EG distance by AP algorithm. The new proposed according to the similarity of this special type of metric distance EG haplotype data, can utilize the structure information between different sites and sites through the simulation and real data. The research found that the APEG method than other existing methods in detection between haplotypes and disease is associated with greater efficiency, but also in gene mapping, can be estimated accurately. Then, we introduced the U-EGS U nonparametric methods based on statistics, the has the advantages of asymptotic normality, and does not require the distribution of the samples by the hypothesis in.U-EGS new kernel EGS is an extension of the EG distance, also can use a Point information. Subsequent simulation studies also confirm that under different parameters, for different disease models, the U statistic using kernel function EGS that can incorporate site information is more powerful than statistical U statistics without using loci information.
【學(xué)位授予單位】:東北師范大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2011
【分類號(hào)】:R346
【引證文獻(xiàn)】
相關(guān)碩士學(xué)位論文 前1條
1 佟良;基因型帶有誤差時(shí)QTL的區(qū)間定位[D];黑龍江大學(xué);2013年
,本文編號(hào):1462092
本文鏈接:http://sikaile.net/xiyixuelunwen/1462092.html
最近更新
教材專著