天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 基因論文 >

綜合分析組學(xué)數(shù)據(jù)以構(gòu)建植物基因結(jié)構(gòu)注釋與功能解析平臺(tái)

發(fā)布時(shí)間:2018-05-20 11:59

  本文選題:生物信息學(xué) + 大數(shù)據(jù) ; 參考:《中國農(nóng)業(yè)大學(xué)》2016年博士論文


【摘要】:大數(shù)據(jù),即超出傳統(tǒng)關(guān)系數(shù)據(jù)庫系統(tǒng)處理范疇的海量數(shù)據(jù)集。隨著測(cè)序技術(shù)及相關(guān)生物學(xué)應(yīng)用的發(fā)展,生命科學(xué)領(lǐng)域已經(jīng)迎來了大數(shù)據(jù)的時(shí)代。如何對(duì)紛繁復(fù)雜的測(cè)序數(shù)據(jù)進(jìn)行挖掘分析是擺在生物信息工作者面前的重要課題。本文從植物領(lǐng)域基因功能研究的需求出發(fā),探討如何利用現(xiàn)有的生物信息學(xué)方法,對(duì)實(shí)驗(yàn)科學(xué)家產(chǎn)生的多維組學(xué)數(shù)據(jù)進(jìn)行剖析,并揭示數(shù)據(jù)背后隱藏的生物學(xué)奧秘。本文首先設(shè)計(jì)了一個(gè)大規(guī)模功能組學(xué)數(shù)據(jù)的標(biāo)準(zhǔn)化分析流程,用于發(fā)現(xiàn)植物新基因與新的可變剪切形式,接著搭建了一個(gè)針對(duì)植物領(lǐng)域的基因集富集分析在線工具,最后構(gòu)建了一個(gè)綜合的植物非編碼RNA數(shù)據(jù)庫分析平臺(tái),對(duì)生物信息學(xué)大規(guī)模組學(xué)數(shù)據(jù)挖掘的幾個(gè)關(guān)鍵方向做了有益的嘗試。當(dāng)獲得某一物種完整的全基因組序列后,對(duì)其總體水平的基因結(jié)構(gòu)注釋是一研究重點(diǎn)。隨著測(cè)序技術(shù)的飛速進(jìn)步,表觀基因組學(xué)和轉(zhuǎn)錄組學(xué)的數(shù)據(jù)也快速積累。為了有效地利用這些組學(xué)數(shù)據(jù)進(jìn)行基因結(jié)構(gòu)注釋,我構(gòu)建了一套標(biāo)準(zhǔn)化的分析流程。首先利用染色體免疫共沉淀結(jié)合高通量測(cè)序(ChIP-seq)技術(shù)產(chǎn)生的數(shù)據(jù),對(duì)植物的全基因組水平上兩個(gè)表觀遺傳修飾(即H3K4me3和H3K27ac)進(jìn)行研究,隨后利用已知的功能基因組注釋信息,對(duì)組蛋白修飾在基因結(jié)構(gòu)上的分布特點(diǎn)進(jìn)行探討。同時(shí)利用轉(zhuǎn)錄組學(xué)的數(shù)據(jù),確認(rèn)了兩個(gè)組蛋白修飾與基因表達(dá)之間的正相關(guān)性。對(duì)實(shí)驗(yàn)室自行產(chǎn)生的及公共平臺(tái)的轉(zhuǎn)錄組學(xué)數(shù)據(jù)進(jìn)行整合后,我對(duì)水稻日本晴和亞洲棉的新基因進(jìn)行了預(yù)測(cè),并利用組蛋白修飾在基因上的分布特點(diǎn),對(duì)新基因的正負(fù)鏈進(jìn)行判定。此外,對(duì)其中數(shù)個(gè)基因進(jìn)行了qRT-PCR實(shí)驗(yàn)的驗(yàn)證。預(yù)測(cè)了新基因的位置后,對(duì)其具體基因結(jié)構(gòu)、表達(dá)的組織特異性以及在染色體上的組蛋白修飾特點(diǎn)等一一進(jìn)行了分析。最后還總結(jié)出了一套利用RNA-seq和ChIP-seq數(shù)據(jù)對(duì)亞洲棉進(jìn)行可變剪切位點(diǎn)預(yù)測(cè)的規(guī)則。在基因結(jié)構(gòu)注釋的基礎(chǔ)上,如何有效利用現(xiàn)有數(shù)據(jù)進(jìn)行基因功能的全面解析,是接下來著重探討的內(nèi)容,F(xiàn)有的植物GO富集工具如EasyGO和AgriGO利用GO詞條進(jìn)行統(tǒng)計(jì)學(xué)分析,得到某些富集詞條相關(guān)的特定基因,達(dá)到幫助生物學(xué)家縮小研究范圍的目的。為了對(duì)一組或多組差異表達(dá)的基因進(jìn)行更加深入細(xì)致的功能研究,我對(duì)GO詞條進(jìn)行拓展,引入了“基因集”這一概念,將包括基因本體論(GO)、植物本體論(PO)、基因家族、KEGG注釋、PlantCyc注釋等多達(dá)九個(gè)方面的基因集類別進(jìn)行基因功能的描述。相比單個(gè)類別而言,基因集對(duì)基因組注釋率有明顯的提高,功能描述的精度和廣度均有很大改善。利用GSEA算法,我開發(fā)了PlantGSEA (http://structuralbiology.cau.edu.cn/PlantGSEA)這一針對(duì)植物領(lǐng)域的基因集富集分析工具,該工具自發(fā)表以來應(yīng)使用者的請(qǐng)求做了多次更新,并得到了科研工作者的廣泛認(rèn)同。另外,生物信息學(xué)二級(jí)數(shù)據(jù)庫能提供單個(gè)DNA或蛋白序列的多方面的功能信息。表觀遺傳學(xué)的研究不但包括組蛋白修飾,還包括非編碼序列的調(diào)控。在對(duì)植物非編碼RNA的工作進(jìn)行調(diào)研時(shí),我發(fā)現(xiàn)現(xiàn)有數(shù)據(jù)庫中涵蓋植物多種類型非編碼序列、多個(gè)層面功能信息的平臺(tái)尚少。分析了已有平臺(tái)的優(yōu)劣勢(shì),利用獲得的信息和掌握的技術(shù),我構(gòu)建了一個(gè)植物非編碼序列相關(guān)的綜合的數(shù)據(jù)庫平臺(tái),并將其命名為PNRD (http://structuralbiology.cau.edu.cn/PNRD)。PNRD一共搜集了150種植物的11個(gè)不同類別,共25739條非編碼RNA序列,46種植物的178138個(gè)miRNA和其靶基因的互作關(guān)系對(duì),35個(gè)miRNA的表達(dá)圖譜數(shù)據(jù),以及整合了148篇文獻(xiàn)的信息挖掘池。平臺(tái)包括五大功能模塊,即搜索模塊、瀏覽模塊、工具模塊、下載頁面以及幫助頁面。本論文旨在構(gòu)建植物基因結(jié)構(gòu)與功能注釋以及組學(xué)數(shù)據(jù)挖掘的平臺(tái)體系,試圖提供一些針對(duì)海量數(shù)據(jù)進(jìn)行綜合分析的解決方案。面對(duì)背景復(fù)雜、噪音巨大的高通量數(shù)據(jù),如何加強(qiáng)實(shí)驗(yàn)科學(xué)家們的洞察力繼而發(fā)現(xiàn)數(shù)據(jù)背后隱藏的價(jià)值,是我們生物信息學(xué)工作者的使命。
[Abstract]:Large data, which is a massive data set beyond the traditional relational database system. With the development of sequencing technology and the development of related biological applications, the field of life science has come to the era of big data. It is an important task for biological information workers to find out how to analyze and analyze the complicated and complicated sequencing data. Based on the needs of the research on gene function in the field of matter, this paper discusses how to use the existing bioinformatics methods to analyze the multidimensional data produced by experimental scientists and reveal the biological mysteries hidden behind the data. This paper first designs a standardized analysis process for a large-scale functional omics data, which is used to discover new plant bases. As a result of the new variable shear form, an online tool for genetic enrichment and analysis for plants was built, and a comprehensive plant non coded RNA database analysis platform was constructed, and a useful attempt was made for several key directions of bioinformatics large scale data mining. Gene structure annotation on its overall level is the focus of research after genome sequencing. With the rapid progress of sequencing technology, epigenetic and transcriptional data are also rapidly accumulated. In order to effectively use these data for genetic structure annotation, I constructed a set of standardized analysis processes. First, the use of chromosomes is a set of chromosomes. Immunoprecipitation combined with high throughput sequencing (ChIP-seq) technology to study two epigenetic modifications (H3K4me3 and H3K27ac) at the whole genome level of plants, and then explore the distribution characteristics of histone modification on the gene structure by using the known functional genome annotation information. The positive correlation between the two histone modification and the gene expression was confirmed. After integrating the transcriptional data of the laboratory and the public platform, I predicted the new genes of rice Japan and Asia cotton, and used the histone to modify the distribution characteristics on the base, and carry out the positive and negative chains of the new genes. In addition, several of the genes were verified by qRT-PCR experiments. After predicting the location of the new genes, the specific gene structure, the tissue specificity of the expression and the characteristics of the histone modification on the chromosomes were analyzed. Finally, an arbitrage was made to use RNA-seq and ChIP-seq data to change the Asian cotton. On the basis of the gene structure annotation, how to effectively use the existing data to fully analyze the function of the gene is the following content. The existing plant GO enrichment tools, such as EasyGO and AgriGO, use the GO word for statistical analysis to get some specific genes related to the enrichment of the word, to reach the help. Biologists reduce the scope of the study. In order to carry out a more thorough and detailed functional study of a group of genes expressed differently, I extend the GO phrase and introduce the concept of "gene set", which will include as many as nine parties, such as gene ontology (GO), plant Ontology (PO), gene family, KEGG annotation, PlantCyc annotation, etc. The gene set category performs the description of the function of the gene. Compared with a single category, the gene set has significantly improved the annotation of the genome, and the accuracy and breadth of the functional description have been greatly improved. Using the GSEA algorithm, I developed the PlantGSEA (http://structuralbiology.cau.edu.cn /PlantGSEA), a gene rich in the plant field. In addition, the two level database of bioinformatics can provide multiple functional information on a single DNA or protein sequence. The epigenetic study includes not only the histone modification but also the non coding sequence. When studying the work of plant non coded RNA, I found that the existing database covers a variety of non coding sequences of plants, and there are few platforms for multiple levels of functional information. I have analyzed the advantages and disadvantages of the existing platforms, and using the acquired information and mastered techniques, I constructed a comprehensive plant non coding sequence related synthesis. The database platform, which was named PNRD (http://structuralbiology.cau.edu.cn/PNRD).PNRD, collected 11 different categories of 150 plants, 25739 non coded RNA sequences, 178138 miRNA of 46 species and the interaction of their target genes, the atlas data of 35 miRNA, and the integration of 148 documents. The platform consists of five functional modules, namely, the search module, the browsing module, the tool module, the download page and the help page. This paper aims to build the platform system of plant gene structure and functional annotation and the data mining of the group, trying to provide some solutions for the comprehensive analysis of the mass data. It is the mission of our bioinformologists to strengthen the insight of experimental scientists and discover the hidden value behind the data.
【學(xué)位授予單位】:中國農(nóng)業(yè)大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2016
【分類號(hào)】:Q943.2
,

本文編號(hào):1914540

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/jiyingongcheng/1914540.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶2796e***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com
成人区人妻精品一区二区三区| 欧美性高清一区二区三区视频| 欧美成人一区二区三区在线| 99国产一区在线播放| 一区二区三区欧美高清| 婷婷伊人综合中文字幕| 亚洲精品中文字幕欧美| 久久大香蕉一区二区三区| 免费黄色一区二区三区| 黑人粗大一区二区三区| 天堂热东京热男人天堂| 国产亚洲视频香蕉一区| 国产一区欧美午夜福利| 亚洲国产成人爱av在线播放下载| 极品少妇嫩草视频在线观看| 国产大屁股喷水在线观看视频| 日韩欧美综合在线播放| 亚洲中文在线中文字幕91| 国产欧美日韩在线一区二区| 91超频在线视频中文字幕| 欧美日韩国产精品自在自线| 亚洲最新的黄色录像在线| 很黄很污在线免费观看| 神马午夜福利一区二区| 麻豆国产精品一区二区三区| 91欧美亚洲精品在线观看| 最近的中文字幕一区二区| 久久热在线免费视频精品| 国产精品日韩欧美第一页| 中字幕一区二区三区久久蜜桃 | 国产精品午夜性色视频| 中文字幕高清不卡一区| 久久亚洲精品中文字幕| 日本欧美视频在线观看免费| 欧美精品亚洲精品日韩精品| 国产香蕉国产精品偷在线观看| 中文字幕精品一区二区三| 国产av天堂一区二区三区粉嫩| 99热九九在线中文字幕| 天堂热东京热男人天堂| 日本加勒比系列在线播放|