癌癥易感基因數(shù)據(jù)庫構(gòu)建及其拷貝數(shù)變異分析
發(fā)布時間:2017-12-27 15:09
本文關(guān)鍵詞:癌癥易感基因數(shù)據(jù)庫構(gòu)建及其拷貝數(shù)變異分析 出處:《安徽大學》2017年碩士論文 論文類型:學位論文
更多相關(guān)文章: 癌癥易感基因 數(shù)據(jù)庫 拷貝數(shù)變異 基因表達 網(wǎng)絡(luò)模塊
【摘要】:基因突變按照其發(fā)生的部位可以分為體細胞突變和生殖細胞突變。體細胞突變只能在體細胞中傳遞,不能直接遺傳下代,而生殖細胞突變則會代代傳遞下去。攜帶生殖細胞突變或表觀遺傳突變,引起癌癥發(fā)生風險增加的基因,我們稱之為癌癥易感基因(cancer predisposition gene,CPG)。對癌癥易感基因的鑒定、識別及相關(guān)生物學機制的研究可以幫助實現(xiàn)癌癥的早預防、早診斷和早治療,同時也有助于癌癥病因?qū)ふ、發(fā)病機制研究和相關(guān)藥物研發(fā)。大部分癌癥易感基因與腫瘤抑制基因的作用機制類似,因基因功能喪失,而導致癌癥發(fā)生。少數(shù)易感基因則與癌基因類似,是通過突變獲得新的功能,進而使細胞周期紊亂而引發(fā)癌癥。在過去的幾十年里,隨著高通量技術(shù),特別是全基因組突變分析(包括外顯子測序和全基因組測序等)的不斷發(fā)展和逐步被應(yīng)用,越來越多的癌癥易感基因被發(fā)現(xiàn)。然而,這些基因及其功能等信息是分散的,目前還沒有一個關(guān)于癌癥易感基因的系統(tǒng)性數(shù)據(jù)庫。我們通過收集并整理不同來源的癌癥易感基因,構(gòu)建了一個較全面的癌癥易感基因數(shù)據(jù)庫資源。為了進一步對癌癥易感基因的拷貝數(shù)變異進行分析,我們還在范癌(pan-cancer)樣本中研究了癌癥易感基因拷貝數(shù)變異與基因表達之間的關(guān)系。全文的主要工作概括如下:1.構(gòu)建癌癥易感基因數(shù)據(jù)庫。為了提供一個完整的用于探索癌癥易感基因及其分子機制的資源,我們首先從五個來源收集了數(shù)據(jù),分別是Rahman's data,PubMed,GeneReview,在線人類孟德爾遺傳基因數(shù)據(jù)庫和GeneRIF(Gene Reference Into Function)。接著,通過文獻閱讀和分析,總共收集到827個人癌癥易感基因(包括724個蛋白質(zhì)編碼基因,23個非編碼基因和80個目前NCBI中沒有給出具體信息的基因),637個大鼠和658個小鼠的人同源癌癥易感基因。為了更好的理解這些癌癥易感基因,我們利用文本挖掘的方法系統(tǒng)地收集了每個基因的基本信息、基因表達、甲基化位點、翻譯后修飾、生殖細胞突變、相互作用、通路信息和藥物信息等8個方面的注釋信息。在此基礎(chǔ)上,我們構(gòu)建了癌癥易感基因數(shù)據(jù)庫網(wǎng)站 dbCPG(http://bioinfo.ahu.edu.cn:8080/dbCPG/index.jsp)。用戶可以非常方便的在該數(shù)據(jù)庫網(wǎng)站上進行數(shù)據(jù)查詢、瀏覽、上傳與下載等操作。最后,為了評估724個編碼蛋白質(zhì)的人癌癥易感基因功能,我們用KOBAS和DAVID兩個在線工具進行富集分析,并用GenRev中的Klein-Ravi算法進行網(wǎng)絡(luò)分析。作為第一個癌癥易感基因數(shù)據(jù)庫,dbCPG不僅是對已有研究結(jié)果的歸納整理,也為癌癥研究人員提供了一個更加容易獲取數(shù)據(jù)資源的平臺。2.癌癥易感基因的拷貝數(shù)變異研究。根據(jù)"two-hit"假說,癌癥發(fā)生是生殖細胞和體細胞突變不斷積累的結(jié)果。因此,在癌癥生物學中,綜合分析生殖細胞突變和體細胞突變對鑒定基因和相關(guān)分子通路至關(guān)重要。已有研究表明癌癥的易感性可能與癌癥易感基因的拷貝數(shù)變異有關(guān)。為了系統(tǒng)地分析癌癥易感基因的拷貝數(shù)變異,我們在范癌樣本中研究易感基因體細胞拷貝數(shù)變異與表達改變的關(guān)系。首先,基于癌癥基因組圖譜數(shù)據(jù)庫(TCGA)中的拷貝數(shù)變異數(shù)據(jù),發(fā)現(xiàn)dbCPG數(shù)據(jù)庫中有729個易感基因有明確地拷貝數(shù)變異信息。對這些基因進一步分析發(fā)現(xiàn)有128個易感基因的拷貝數(shù)缺失(CNL)樣本數(shù)是拷貝數(shù)增加(CNG)樣本數(shù)的兩倍。針對這128個基因,我們將TCGA中的表達數(shù)據(jù)與拷貝數(shù)缺失數(shù)據(jù)結(jié)合分析,得到49個拷貝數(shù)缺失且表達降低的癌癥易感基因。統(tǒng)計發(fā)現(xiàn)其中有5個基因在至少50個腫瘤樣本中拷貝數(shù)缺失和表達下調(diào)變化具有一致性,分別是MT4P(216個樣本),PTEN(143個),MCPH1(86個),SMAD4(63個)和MINPP1(51個)。這說明在癌癥發(fā)生過程中拷貝數(shù)缺失可能是導致基因表達發(fā)生改變的驅(qū)動力。對這49個基因進行網(wǎng)絡(luò)分析,我們發(fā)現(xiàn)在提取到的子網(wǎng)絡(luò)中各基因之間聯(lián)系較為緊密,進而說明這些基因在癌癥發(fā)生過程中可能有相似的生物學機制。這是第一次在范癌樣本中研究癌癥易感基因拷貝數(shù)缺失與基因表達下調(diào)的關(guān)系,盡管有一些不足,但以上結(jié)果將會幫助人們更加直觀理解易感基因在癌癥發(fā)生過程中的生物學功能。
[Abstract]:Gene mutation can be divided into somatic mutation and germ cell mutation according to the location of its occurrence. Somatic mutation can only be transmitted in somatic cells, which can not be directly inherited from the next generation, and the mutation of the germ cell will be passed on in the generation. The gene that carries germ cell mutation or epigenetic mutation and increases the risk of cancer is called cancer predisposition gene (CPG). Identification, identification and related biological mechanisms of cancer susceptibility genes can help to achieve early prevention, early diagnosis and early treatment of cancer, and also contribute to cancer etiology finding, pathogenesis research and related drug research and development. Most cancer susceptibility genes are similar to the mechanism of tumor suppressor genes, resulting in cancer because of loss of gene function. A small number of susceptible genes are similar to oncogenes, which can get new functions by mutation and cause cell cycle disorder to cause cancer. Over the past decades, with the continuous development and gradual application of high-throughput technology, especially the whole genome mutation analysis, including exon sequencing and genome sequencing, more and more cancer susceptibility genes have been found. However, the information of these genes and their functions is scattered, and there is not yet a systematic database on cancer susceptible genes. We build a more comprehensive database of cancer susceptibility genes by collecting and sorting out cancer susceptibility genes from different sources. In order to further analyze the copy number variation of cancer susceptibility genes, we also studied the relationship between copy number variation and gene expression in cancer samples (pan-cancer). The main work of this paper is summarized as follows: 1. the construction of cancer susceptibility gene database. In order to provide a complete for susceptible gene and to explore the molecular mechanism of cancer, we collected data from five sources, namely Rahman's data, PubMed, GeneReview, online human Mendel gene database and GeneRIF (Gene Reference Into Function). Then, through literature reading and analysis, a total of 827 cancer susceptibility genes (including 724 protein coding genes, 23 non coding genes and 80 genes that did not give specific information in NCBI) were collected, and the homologous cancer susceptible bases of 637 rats and 658 mice were collected. In order to better understand these cancer susceptibility genes, we use the method of text mining system to collect the basic information of each gene, gene expression, methylation, post-translational modification, germ cell mutation, interaction, channel information and drug information such as 8 aspects of the annotation information. On this basis, we built the cancer susceptibility gene database website dbCPG (http://bioinfo.ahu.edu.cn:8080/dbCPG/index.jsp). Users can be very convenient to query, browse, upload and download data on the database website. Finally, in order to evaluate the function of 724 protein coding human cancer susceptibility genes, we used KOBAS and DAVID two online tools for enrichment analysis, and used Klein-Ravi algorithm in GenRev to carry out network analysis. As the first cancer susceptibility gene database, dbCPG is not only a generalization of the existing research results, but also a platform for cancer researchers to get data resources more easily. 2. study on the copy number variation of cancer susceptible genes. According to the "two-hit" hypothesis, the occurrence of cancer is the result of continuous accumulation of mutagenesis of germ cells and somatic cells. Therefore, in cancer biology, the comprehensive analysis of germ cell mutation and somatic mutation is essential for the identification of genes and related molecular pathways. Studies have shown that the susceptibility of cancer may be associated with the copy number variation of cancer susceptible genes. In order to systematically analyze the copy number variation in cancer susceptibility genes, we study the relationship between the susceptible cell genomic copy number variation and expression changes in cancer samples in the van. First, based on the copy number variation data in the cancer genome map database (TCGA), we find that 729 susceptible genes in dbCPG database have a clear copy number variation information. Further analysis of these genes found that the number of copy number deletion (CNL) samples of 128 susceptible genes was two times as much as the number of copies (CNG). In view of these 128 genes, we combine the expression data in TCGA with copy number missing data to get 49 copies of cancer susceptible genes with reduced copy number and reduced expression. Statistics showed that 5 genes were consistent in the at least 50 tumor samples, which were MT4P (216 samples), PTEN (143), MCPH1 (86), SMAD4 (63) and MINPP1 (51). This suggests that the deletion of the number of copies in the process of cancer may be the driving force that causes changes in gene expression. Based on the network analysis of these 49 genes, we find that the genes are closely related in the extracted subnetworks, which indicates that these genes may have similar biological mechanisms in the process of cancer occurrence. This is the first time to study the relationship between loss of copy number and down-regulation of gene expression in cancer samples. Although there are some shortcomings, the above results will help people understand intuitively the biological function of susceptible genes in the process of cancer occurrence.
【學位授予單位】:安徽大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:R73;Q811.4
【相似文獻】
相關(guān)期刊論文 前2條
1 吳柏林;;預測性遺傳檢查——個體化醫(yī)療的重要基石[J];科學;2003年02期
2 ;[J];;年期
相關(guān)重要報紙文章 前1條
1 麥迪信;最常見癌癥易感基因TGFBR1*6A被發(fā)現(xiàn)[N];醫(yī)藥經(jīng)濟報;2003年
相關(guān)碩士學位論文 前1條
1 魏然;癌癥易感基因數(shù)據(jù)庫構(gòu)建及其拷貝數(shù)變異分析[D];安徽大學;2017年
,本文編號:1342199
本文鏈接:http://sikaile.net/shoufeilunwen/benkebiyelunwen/1342199.html
最近更新
教材專著