遺傳流行病統(tǒng)計(jì)分析軟件SAGE的開(kāi)發(fā)與應(yīng)用
[Abstract]:Background and research objectives
Genetic epidemiology is a frontier hot subject developed in recent years. It mainly studies the genetic and environmental factors that affect the distribution of diseases in different populations and puts forward reasonable preventive measures. Its theoretical basis is population genetics and epidemiology, mainly the application of epidemiological population data. Methods of collection and processing, as well as experimental methods of molecular genetics, with the help of relevant principles and methods of biostatistics, to study and explore the individual effects of genetic and environmental factors on diseases and their combined effects on diseases. With the discovery of polymorphic sequence markers in the process of human genome sequencing, the search for disease genes is accelerating. The study of polygenic diseases has become the focus of attention for a long time.
Up to now, a set of effective research systems have been established for single-gene genetic diseases which conform to Mendelian inheritance and nearly one thousand pathogenic genes have been cloned. However, for polygenic diseases, these complex traits show a certain tendency of family clustering, but they do not fully conform to Mendelian. Delphi inheritance law, therefore, still has many problems in the mapping and genetic analysis of susceptible genes, and has become a difficult and hot spot in medical genetics and gene research in recent years. Balance analysis has become an important method for gene mapping. However, due to the huge genetic data, complicated analysis and complex structure, it is difficult to make full use of the information of the data with general statistical methods and software. Analytical ability is not strong.
For example, FASTLINK, LINKAGE, VITESSE and GENEHUNTER, MERLIN, MELINK are available for parametric linkage analysis, while GENEHUNTER, MERLIN and MELINK are available for non-parametric linkage analysis. Because of the huge population and abundant demographic data, our country is a good resource repository for studying human genetic information. At present, there is no good combination of statistics and genetics, which makes geneticists in information collection and data analysis. There are a lot of problems, such as what kind of data to collect, sample size and what kind of genetic statistics method to use. It is a pity that the information can not be fully utilized, resulting in a huge waste of information.
Due to the non-strict one-to-one correspondence between the phenotype and genotype of polygenic diseases, it is necessary to use a variety of analytical methods in the analysis of data. This also makes some special software for genetic analysis more and more expose the limitations of its application, and foreign software is generally English software, which makes geneticists waste a lot of money. The amount of manpower and material resources to learn these software, so the urgent need for a powerful comprehensive genetic statistics software. And genetic epidemiology statistical analysis software package SAGE (Statistical Analysis for Genetic Epidemiology) just meets our needs. HGAR, created by Human Genetic Analysis Resource (HGAR), was founded in the Department of Epidemiology and Statistics of Case Western Reserve University (CWRU) in Cleveland, USA. It was funded by the US Public Health Service and the NIH National Research Resource Center. The software was developed by R.C. Elston, a famous statistical geneticist. Developed in 1987 by its team, the software has been continuously updated over time, from the initial version 1.0 to the current version 5.3.0, and its functions are also increasing, and its position in genetic epidemiological analysis is getting more and more attention.
research method
Through the introduction of five examples files from SAGE software as original data files, each function module is analyzed in detail. The SAGE has one custom module and 18 function modules, which are divided into 18 chapters.
Chapter 1: Overview of SAGE. The input and output files, running environment and characteristics of the basic functional modules of SAGE software are given. Users should pay attention to the system requirements when installing the software.
Chapter 2: Establishment, editing and sorting of SAGE data files. It mainly introduces three methods of establishing data files, the import, export and renaming of projects, etc.
Chapter 3: User-defined functional modules. It mainly introduces how to create genomic data files and create new variables. The emphasis is to create new variables.
Chapter 4: General Statistical Analysis of SAGE (PEDINFO). It mainly introduces the function, principle and operation of PEDINFO, and explains the results. The emphasis is on the explanation of the results. The following 14 chapters are from the function, principle, operation process and main output results of the module.
Chapter 5: Non-Mendelian Genetic Statistical Analysis (MARKERINFO). Mainly used to detect non-Mendelian genetic information in the family coefficient data, to help users detect inconsistent data. The premise is to understand Mendelian genetic law.
Chapter 6: Reclassification of Relative Pairs (RELTEST). The original relatives are reclassified by genomic multilocus scanning data, mainly based on the principle of chromosomal consanguinity (IBD) allele sharing. The emphasis is on understanding IBD and IBS, and explaining the results.
Chapter 7: Allele Frequency Estimation (FREQ). Estimation of individual allele frequencies of known family structures and generation of marker site descriptors. The resulting site files can be used in GENIBD, MLOD and other SAGE programs. The main functions of this module are to output site files and output intimacy coefficients.
Chapter 8: Allelic Association or Data Trait Transfer Disequilibrium Test (ASSOC). It is mainly used to estimate the family coefficient. The covariate can be transformed from the marker phenotype to estimate the family residual correlation coefficient or heritability.
Chapter 9: Family Correlation Analysis (FCOR). It is mainly used to estimate the multivariate correlations of all related pairs in a family and their asymptotic standard errors.
Chapter 10: Mixed Separation Analysis and Complex Separation Analysis (SEGREG). Mainly used to detect and select separation analysis models on the basis of family-related relationships provided. Its characteristics can be continuous, binary or age-related binary classification characteristics, producing an explicit rate file for model-based linkage analysis. Selection of suitable models for different characteristics.
Chapter 11: GENIBD. This function module is mainly used to coordinate the calculation of various family coefficients through a variety of algorithms to produce a uniform allele distribution of units and multiple loci. The emphasis is on different models for different data.
Chapter 12: Age-related seizure analysis (AGEON): Applies to the simultaneous comparison of age-related distribution data between affected and non-involved pairs, allowing for covariate adjustment of mean, variance, or skewness distributions.
Chapter 13: Haplotype Analysis (DECIPHER): Mainly used to estimate the maximum likelihood of haplotype frequencies of autosomal or X-sex chromosomes in a population.
Chapter 14: Model-based Unit Point Linkage Analysis (LODLINK). Mainly used to calculate the LOD values between the main model-based features and the two points between the loci. The main characteristics may be any marker or other characteristics that conform to Mendelian transmission. The emphasis is on the naming of the main features and the explicit file generated from the SEGERG program.
Chapter 15: Model-based multilocus linkage analysis (MLOD). It is mainly used to calculate the multilocus linkage analysis between small or large model-based families. The emphasis is on the generation and identification of major characteristics of genomic data files.
Chapter 16: Siblin-to-Siblin Linkage Analysis (SIBPAL). It can be a shared consanguineous allele information at a single point or multiple loci. Bivariate and contiguous variables are used simultaneously according to the multilocus genes, including epistatic interactions and covariate effects. The emphasis is on different characteristics that need to be set accordingly.
Chapter 17: Lods linkage analysis of affected siblings (LODPAL). The program is based on Lods scores of affected siblings. Currently, the general conditional logistic regression model is implemented. Attention should be paid to the setting of effectiveness.
Chapter 18: Transfer Disequilibrium Test (TDT). The TDT in the program is based on the basic model of transfer disequilibrium. It is used to analyze the linkage between marker sites and disease sites under the condition of known linkage disequilibrium. The disease characteristics are binary variables. The premise is to master the principle of TDT.
Result
Through this paper, geneticists can make full use of their genetic data for genetic statistical analysis, saving manpower and material resources, learning this software can guide geneticists to collect genetic data, as far as possible use of genetic data, thus speeding up the development of genetic epidemiology.
【學(xué)位授予單位】:南方醫(yī)科大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2007
【分類(lèi)號(hào)】:TP311.52;R181.3
【相似文獻(xiàn)】
相關(guān)會(huì)議論文 前10條
1 葉冬青;施小明;陸偉;;系統(tǒng)性紅斑狼瘡的遺傳流行病學(xué)研究[A];新世紀(jì)預(yù)防醫(yī)學(xué)面臨的挑戰(zhàn)——中華預(yù)防醫(yī)學(xué)會(huì)首屆學(xué)術(shù)年會(huì)論文摘要集[C];2002年
2 張彩霞;鮑忠贊;周前凱;魏廣兵;徐世清;司馬楊虎;;家蠶正反交SAGE表達(dá)分析[A];中國(guó)蠶學(xué)會(huì)第八屆暨國(guó)家蠶桑產(chǎn)業(yè)技術(shù)體系家(柞)蠶遺傳育種及良種繁育學(xué)術(shù)研討會(huì)論文集[C];2011年
3 楊智;鄒勇莉;涂穎;顧華;何黎;;痤瘡遺傳模式研究[A];2006中國(guó)中西醫(yī)結(jié)合皮膚性病學(xué)術(shù)會(huì)議論文匯編[C];2006年
4 徐德忠;王安輝;李良?jí)?;人類(lèi)基因組流行病學(xué)的研究[A];新世紀(jì)預(yù)防醫(yī)學(xué)面臨的挑戰(zhàn)——中華預(yù)防醫(yī)學(xué)會(huì)首屆學(xué)術(shù)年會(huì)論文摘要集[C];2002年
5 林曉玲;劉芳;盧大儒;徐劍鋒;;中國(guó)人群前列腺特異性抗原的遺傳研究[A];2012年中國(guó)青年遺傳學(xué)家論壇會(huì)議文集[C];2012年
6 陳曉錚;林新華;李明祿;伍民友;;基于SAGE的分布式虛擬現(xiàn)實(shí)框架[A];2008年全國(guó)開(kāi)放式分布與并行計(jì)算機(jī)學(xué)術(shù)會(huì)議論文集(上冊(cè))[C];2008年
7 徐德忠;王安輝;李壽良;;人類(lèi)基因組流行病學(xué)的研究[A];新世紀(jì)預(yù)防醫(yī)學(xué)面臨的挑戰(zhàn)——中華預(yù)防醫(yī)學(xué)會(huì)首屆學(xué)術(shù)年會(huì)論文摘要集[C];2002年
8 張玉琦;徐文煒;程灶火;李桂林;吳越;顧君;張明廉;;阿爾茨海默病的遺傳流行病學(xué)研究[A];中華醫(yī)學(xué)會(huì)精神病學(xué)分會(huì)第九次全國(guó)學(xué)術(shù)會(huì)議論文集[C];2011年
9 施慎遜;;女性抑郁癥遺傳流行病學(xué)國(guó)際合作課題[A];中華醫(yī)學(xué)會(huì)精神病學(xué)分會(huì)第九次全國(guó)學(xué)術(shù)會(huì)議論文集[C];2011年
10 劉菊華;金志強(qiáng);徐碧玉;;植物功能基因組學(xué)研究技術(shù)及其在熱帶作物上的應(yīng)用前景[A];中國(guó)熱帶作物學(xué)會(huì)第七次全國(guó)會(huì)員代表大會(huì)暨學(xué)術(shù)討論會(huì)論文集[C];2004年
相關(guān)重要報(bào)紙文章 前10條
1 Jet;時(shí)尚之風(fēng)[N];計(jì)算機(jī)世界;2004年
2 ;Web—mail商務(wù)應(yīng)用異軍突起[N];科技日?qǐng)?bào);2000年
3 ;郵件系統(tǒng)供應(yīng)商掃描(一)[N];中國(guó)計(jì)算機(jī)報(bào);2001年
4 秀文;波導(dǎo)股份(600302)生產(chǎn)沒(méi)有盲點(diǎn)的手機(jī)[N];山西日?qǐng)?bào);2000年
5 本報(bào)記者 阮湘華 通訊員 武明飛;天喻信息在調(diào)整中崛起[N];科技日?qǐng)?bào);2005年
6 楊朝英;專用通訊市場(chǎng)烽煙再起[N];人民政協(xié)報(bào);2004年
7 本報(bào)記者 宋劍峰;被遺漏的人類(lèi)基因?[N];中國(guó)高新技術(shù)產(chǎn)業(yè)導(dǎo)報(bào);2002年
8 深圳海景貿(mào)易公司 杜越;延長(zhǎng)信息的觸角[N];網(wǎng)絡(luò)世界;2001年
9 葉黎明;波導(dǎo)以專搏大[N];科技日?qǐng)?bào);2000年
10 安徽醫(yī)科大學(xué)教授 張學(xué)軍;“牛皮癬”病因查明:一遺傳 二環(huán)境[N];健康報(bào);2001年
相關(guān)博士學(xué)位論文 前10條
1 劉江波;白癜風(fēng)的遺傳流行病學(xué)研究[D];安徽醫(yī)科大學(xué);2005年
2 董艷彬;高血壓的遺傳易感性及其分子基礎(chǔ)的臨床與實(shí)驗(yàn)研究[D];中國(guó)協(xié)和醫(yī)科大學(xué);1995年
3 潘發(fā)明;中國(guó)漢族人群免疫球蛋白受體家族基因單核苷酸多態(tài)性與系統(tǒng)性紅斑狼瘡的關(guān)聯(lián)研究[D];安徽醫(yī)科大學(xué);2006年
4 王先良;基于甲基化特異性引物和SAGE的高通量DNA甲基化定量檢測(cè)方法研究[D];華中科技大學(xué);2006年
5 楊森;六種常見(jiàn)皮膚。▽こP豌y屑病、白癜風(fēng)、斑禿、瘢痕疙瘩、花斑癬、雀斑)的遺傳流行病學(xué)比較性研究[D];安徽醫(yī)科大學(xué);2007年
6 唐曉武;中國(guó)漢族人群免疫球蛋白受體同系物家簇基因單核苷酸多態(tài)性與強(qiáng)直性脊柱炎的關(guān)聯(lián)研究[D];安徽醫(yī)科大學(xué);2009年
7 甘麗萍;家蠶黃繭限性品種雌雄SAGE文庫(kù)的構(gòu)建及其差異表達(dá)基因的研究[D];蘇州大學(xué);2011年
8 黃健華;基于SAGE技術(shù)的家蠶基因表達(dá)譜研究[D];中國(guó)科學(xué)院研究生院(上海生命科學(xué)研究院);2007年
9 緱金營(yíng);棉花纖維發(fā)育研究:表達(dá)譜和代謝譜分析[D];中國(guó)科學(xué)院研究生院(上海生命科學(xué)研究院);2006年
10 徐佳;高通量基因篩選技術(shù)的應(yīng)用及優(yōu)化[D];山東大學(xué);2010年
相關(guān)碩士學(xué)位論文 前10條
1 陳莉雅;遺傳流行病統(tǒng)計(jì)分析軟件SAGE的開(kāi)發(fā)與應(yīng)用[D];南方醫(yī)科大學(xué);2007年
2 鮑忠贊;家蠶幼蟲(chóng)高溫處理前后SAGE文庫(kù)的構(gòu)建與分析及差異表達(dá)熱激蛋白基因的研究[D];蘇州大學(xué);2012年
3 張彩霞;家蠶正反交F_1代SAGE文庫(kù)的構(gòu)建與分析及差異基因的時(shí)空表達(dá)譜研究[D];蘇州大學(xué);2012年
4 王惠琳;GLGI技術(shù)鑒定和分析SLE患者CD4~+和CD8~+T細(xì)胞基因表達(dá)譜的初步研究[D];第三軍醫(yī)大學(xué);2006年
5 潘興元;應(yīng)用生物信息學(xué)方法從低氧處理人動(dòng)脈內(nèi)皮細(xì)胞SAGE庫(kù)中挖掘低氧反應(yīng)相關(guān)新基因[D];南京師范大學(xué);2005年
6 王劍;漢族人系統(tǒng)性紅斑狼瘡遺傳流行病學(xué)研究[D];安徽醫(yī)科大學(xué);2006年
7 閆會(huì)萍;單純性肥胖患者脂肪組織中新陳代謝相關(guān)基因的表達(dá)分布[D];北京體育大學(xué);2006年
8 黃躍峰;超水稻雜交基因研究和數(shù)據(jù)庫(kù)構(gòu)建[D];吉林大學(xué);2008年
9 張校輝;胃癌遺傳流行病學(xué)研究[D];鄭州大學(xué);2007年
10 陳曉錚;基于SAGE的分布式虛擬現(xiàn)實(shí)框架[D];上海交通大學(xué);2008年
本文編號(hào):2179580
本文鏈接:http://sikaile.net/yixuelunwen/liuxingb/2179580.html