醫(yī)學遺傳資料統(tǒng)計分析方法的研究與SAS實現
發(fā)布時間:2018-09-14 07:01
【摘要】: 數理統(tǒng)計分析方法在醫(yī)學遺傳學的發(fā)展過程中發(fā)揮了不可替代的作用,隨著基礎醫(yī)學的發(fā)展、遺傳學實驗技術的不斷更新,許多遺傳統(tǒng)計分析技術已經成熟,應用越來越普及,同時新的分析方法不斷地涌現出來。針對新的、更為復雜的方法如何運用,成熟、普及方法如何快速實現計算是當今醫(yī)學遺傳科研人員面臨的問題。本研究針對醫(yī)學遺傳資料統(tǒng)計分析方法進行了比較細致的研究,特別是遺傳結果多重比較的校正、多個位點與疾病的關聯研究、連鎖分析等問題,通過反復測算,提出了自己的見解,并將全部方法運用世界權威的統(tǒng)計分析軟件—SAS軟件,通過調用過程步、編程實現了計算。 針對目前醫(yī)學遺傳學中主要的統(tǒng)計分析方法,本研究側重進行了以下幾部分工作: 第一部分:測算基因頻率、基因型頻率以及驗證Hardy-Weinberg平衡定律哈代-溫伯格平衡定律在遺傳學的研究中起著非常重要的作用。在對遺傳基因型數據進行分析前,最好能夠先檢驗數據是否符合哈代-溫伯格平衡定律。本章介紹了哈代-溫伯格平衡定律的基礎理論,并利用軟件計算基因、基因型頻率、驗證哈代-溫伯格平衡定律、利用蒙特卡洛模擬校正概率。 第二部分:運用病例對照方法尋找疾病的關聯位點 病例-對照研究是分析流行病學研究方法中最基本、最重要的研究類型之一,是檢驗病因假說的重要工具。在遺傳流行病學中,利用病例-對照研究可以找到復雜疾病的關聯基因?刹捎靡话悝2檢驗與Armitage趨勢檢驗。采用一般χ2檢驗來求得疾病與某個位點的相關性,要求檢驗的群體滿足哈代-溫伯格平衡定律。研究表明,如果哈代-溫伯格平衡定律不成立,χ2檢驗的第一類錯誤會增加,因此應根據基因型數據用Armitage趨勢檢驗來作統(tǒng)計分析。 第三部分:遺傳分析結果的校正 在病例-對照遺傳流行病數據分析過程中,隨著生物技術的迅速發(fā)展,實驗室快速檢測大量位點已經成為常規(guī)手段。對于每一個位點都需要進行統(tǒng)計學檢驗,如果位點過多,多重比較會導致假陽性率無限增大,從而使得結論不可信,因此需要對多重比較進行校正,本章運用3種平滑修正的方法,以及校正方法Bonferroni法、Sidak法等。 第四部分:家系數據的關聯分析 利用家庭成員作為對照是按祖先起源匹配的最好辦法,以遺傳背景一致的家庭成員作為對照,可以很好地解決人群分層問題。根據家庭成員不同,分析方法也不盡相同。本章對家系病例對照數據進行了TDT、s-TDT、SDT檢驗。 第五部分:連鎖不平衡與單體型分析 連鎖不平衡分析、單體型分析是一類對疾病相關聯基因進行精確定位的高效的方法,在檢測復雜疾病基因時起到了巨大的作用。在數據收集方面,它不需要收集家系數據,這是與家系數據疾病關聯分析的一個區(qū)別——它的應用條件比較寬泛。本章細致地研究了連鎖不平衡檢測方法、單體型與疾病關聯分析。 第六部分:近交系數與親緣系數的計算 近親婚配為非隨機婚配,這類婚配嚴重影響著群體中的基因平衡法則,導致群體中純合子和雜合子的比率發(fā)生變化。哈代-溫伯格法則僅僅適用于隨機婚配的群體而不適用于這類群體。本章將對近親婚配中近交系數和親緣系數進行計算。 第七部分:連鎖分析 個體形成性細胞過程中,減數分裂時同源染色體間發(fā)生交換的頻率稱為重組率。重組率的大小與同一條染色體上兩個基因座位距離有關,一般說距離遠時發(fā)生交換的機會多,重組率高,若重組率超過0.50,表明這兩個基因座位不在同一條染色體上。重組率比較低,說明兩個基因座位位置比較近,這兩個基因座位上的等位基因傳遞到下一代是不獨立的,這種現象在遺傳學中稱為連鎖。本章主要介紹貝葉斯方法和蒙特卡洛模擬法估計重組率。 文中采用SAS9.1.3、SAS9.2分析軟件genetics模塊、stat模塊中多個過程步以及編程方法對醫(yī)學遺傳學資料和數據進行了統(tǒng)計運算。本文運用了統(tǒng)計模型理論與實例分析相結合,理論研究與軟件實現結合,數學方法與遺傳實驗技術結合的總體思路,按著由簡到繁的過程系統(tǒng)地介紹了各種遺傳統(tǒng)計分析方法,以及統(tǒng)計分析模型及計算原理,尤其對于遺傳結果校正、多個位點與疾病關聯、連鎖分析等方法進行了詳細的闡述,提出了新觀點。文中突出了統(tǒng)計分析方法的應用技巧和便捷實現,不但為醫(yī)學遺傳學提供了統(tǒng)計方法學,更為該分支學科的數據運算提供了新平臺。
[Abstract]:Mathematical statistics analysis method plays an irreplaceable role in the development of medical genetics. With the development of basic medicine and the renewal of genetic experiment technology, many genetic statistics analysis techniques have been mature and applied more and more widely. At the same time, new analysis methods have emerged constantly. How to use, mature, and popularize the method of computing is a problem facing medical genetic researchers. This study focuses on the statistical analysis of medical genetic data, especially the correction of multiple comparisons of genetic results, the association between multiple loci and disease, linkage analysis and other issues. Through Repeated calculation, put forward their own views, and all the methods used in the world's authoritative statistical analysis software-SAS software, through the call process step, programming to achieve the calculation.
In view of the main statistical analysis methods in medical genetics, this study focuses on the following parts:
Part one: Estimating gene frequency, genotype frequency and verifying Hardy-Weinberg equilibrium law Hardy-Weinberg equilibrium law play a very important role in genetics research. It is better to check whether the data conform to Hardy-Weinberg equilibrium law before analyzing the genotype data. The basic theory of Weinberg's equilibrium law, and the use of software to calculate the gene, genotype frequency, verify Hardy-Weinberg equilibrium law, using Monte Carlo simulation correction probability.
The second part: using case control method to find the related sites of disease.
Case-control study is one of the most basic and important types of epidemiological research methods and an important tool to test the hypothesis of etiology. In genetic epidemiology, case-control study can be used to find the genes associated with complex diseases. Studies have shown that if Hardy-Weinberg equilibrium law does not hold, the first type of errors in the_2 test will increase, so the Armitage trend test should be used for statistical analysis based on genotype data.
The third part: correction of genetic analysis results.
With the rapid development of biotechnology, rapid detection of large numbers of loci has become a routine method in case-control genetic epidemiological data analysis. Statistical tests are required for each locus. If too many loci are present, multiple comparisons will lead to an infinite increase in false positive rates, making the conclusions unreliable. To correct multiple comparisons, three smoothing correction methods, Bonferroni method and Sidak method are used in this chapter.
The fourth part: family data correlation analysis.
Using family members as controls is the best way to match according to ancestral origin. Using family members with identical genetic background as controls can solve the problem of population stratification.
The fifth part: linkage disequilibrium and haplotype analysis.
Linkage disequilibrium analysis, haplotype analysis, is a class of highly efficient methods for precise mapping of disease-related genes, which plays a huge role in detecting complex disease genes. In this chapter, the linkage disequilibrium detection methods, haplotype and disease association are studied in detail.
The sixth part: Calculation of inbreeding coefficient and kin coefficient.
Inbreeding is a kind of non-random mating, which seriously affects the law of gene balance in the population, resulting in changes in the ratio of homozygotes and heterozygotes in the population.
The seventh part: linkage analysis.
In the process of individual morphogenetic cells, the frequency of exchange between homologous chromosomes during meiosis is called recombination rate. The size of recombination rate is related to the distance between two loci on the same chromosome. Generally speaking, there are more chances of exchange and higher recombination rate when the distance is long. If the recombination rate exceeds 0.50, the two loci are not the same. The low recombination rate on chromosome indicates that the two loci are close together, and the allele transfer from the two loci to the next generation is not independent. This phenomenon is called linkage in genetics. This chapter mainly introduces Bayesian method and Monte Carlo simulation to estimate recombination rate.
In this paper, we use SAS 9.1.3, SAS 9.2 analysis software genetics module, stat module in many process steps and programming methods for statistical calculation of medical genetic data and data. According to the process from simplicity to complexity, this paper systematically introduces various methods of genetic statistical analysis, as well as statistical analysis models and calculation principles, especially elaborates on the methods of genetic result correction, multiple loci associated with disease, linkage analysis and so on, and puts forward new viewpoints. Simple implementation not only provides statistical methodology for medical genetics, but also provides a new platform for data processing in this branch of science.
【學位授予單位】:中國人民解放軍軍事醫(yī)學科學院
【學位級別】:碩士
【學位授予年份】:2010
【分類號】:R311
本文編號:2241915
[Abstract]:Mathematical statistics analysis method plays an irreplaceable role in the development of medical genetics. With the development of basic medicine and the renewal of genetic experiment technology, many genetic statistics analysis techniques have been mature and applied more and more widely. At the same time, new analysis methods have emerged constantly. How to use, mature, and popularize the method of computing is a problem facing medical genetic researchers. This study focuses on the statistical analysis of medical genetic data, especially the correction of multiple comparisons of genetic results, the association between multiple loci and disease, linkage analysis and other issues. Through Repeated calculation, put forward their own views, and all the methods used in the world's authoritative statistical analysis software-SAS software, through the call process step, programming to achieve the calculation.
In view of the main statistical analysis methods in medical genetics, this study focuses on the following parts:
Part one: Estimating gene frequency, genotype frequency and verifying Hardy-Weinberg equilibrium law Hardy-Weinberg equilibrium law play a very important role in genetics research. It is better to check whether the data conform to Hardy-Weinberg equilibrium law before analyzing the genotype data. The basic theory of Weinberg's equilibrium law, and the use of software to calculate the gene, genotype frequency, verify Hardy-Weinberg equilibrium law, using Monte Carlo simulation correction probability.
The second part: using case control method to find the related sites of disease.
Case-control study is one of the most basic and important types of epidemiological research methods and an important tool to test the hypothesis of etiology. In genetic epidemiology, case-control study can be used to find the genes associated with complex diseases. Studies have shown that if Hardy-Weinberg equilibrium law does not hold, the first type of errors in the_2 test will increase, so the Armitage trend test should be used for statistical analysis based on genotype data.
The third part: correction of genetic analysis results.
With the rapid development of biotechnology, rapid detection of large numbers of loci has become a routine method in case-control genetic epidemiological data analysis. Statistical tests are required for each locus. If too many loci are present, multiple comparisons will lead to an infinite increase in false positive rates, making the conclusions unreliable. To correct multiple comparisons, three smoothing correction methods, Bonferroni method and Sidak method are used in this chapter.
The fourth part: family data correlation analysis.
Using family members as controls is the best way to match according to ancestral origin. Using family members with identical genetic background as controls can solve the problem of population stratification.
The fifth part: linkage disequilibrium and haplotype analysis.
Linkage disequilibrium analysis, haplotype analysis, is a class of highly efficient methods for precise mapping of disease-related genes, which plays a huge role in detecting complex disease genes. In this chapter, the linkage disequilibrium detection methods, haplotype and disease association are studied in detail.
The sixth part: Calculation of inbreeding coefficient and kin coefficient.
Inbreeding is a kind of non-random mating, which seriously affects the law of gene balance in the population, resulting in changes in the ratio of homozygotes and heterozygotes in the population.
The seventh part: linkage analysis.
In the process of individual morphogenetic cells, the frequency of exchange between homologous chromosomes during meiosis is called recombination rate. The size of recombination rate is related to the distance between two loci on the same chromosome. Generally speaking, there are more chances of exchange and higher recombination rate when the distance is long. If the recombination rate exceeds 0.50, the two loci are not the same. The low recombination rate on chromosome indicates that the two loci are close together, and the allele transfer from the two loci to the next generation is not independent. This phenomenon is called linkage in genetics. This chapter mainly introduces Bayesian method and Monte Carlo simulation to estimate recombination rate.
In this paper, we use SAS 9.1.3, SAS 9.2 analysis software genetics module, stat module in many process steps and programming methods for statistical calculation of medical genetic data and data. According to the process from simplicity to complexity, this paper systematically introduces various methods of genetic statistical analysis, as well as statistical analysis models and calculation principles, especially elaborates on the methods of genetic result correction, multiple loci associated with disease, linkage analysis and so on, and puts forward new viewpoints. Simple implementation not only provides statistical methodology for medical genetics, but also provides a new platform for data processing in this branch of science.
【學位授予單位】:中國人民解放軍軍事醫(yī)學科學院
【學位級別】:碩士
【學位授予年份】:2010
【分類號】:R311
【參考文獻】
相關期刊論文 前2條
1 易洪剛;陳峰;于浩;趙楊;婁東華;;病例同胞對照設計[J];中華流行病學雜志;2006年02期
2 湯在祥;王學楓;吳雯雯;徐辰武;;基于貝葉斯統(tǒng)計的遺傳連鎖分析方法[J];遺傳;2006年09期
,本文編號:2241915
本文鏈接:http://sikaile.net/yixuelunwen/shiyanyixue/2241915.html
最近更新
教材專著