天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

鼠疫耶爾森氏菌基因組重注釋及其跨組學數據庫系統(tǒng)的構建

發(fā)布時間:2018-04-04 02:17

  本文選題:鼠疫菌 切入點:重注釋 出處:《中國人民解放軍軍事醫(yī)學科學院》2016年博士論文


【摘要】:鼠疫耶爾森氏菌(Yersinia pestis)是一種能夠引起致命全身感染的高危細菌,世界上曾發(fā)生過三次鼠疫大流行,死亡人數過億。根據WHO的數據,僅2001-2015年間,全球就發(fā)生18次鼠疫公共安全事件。中國目前已經發(fā)現12塊典型的鼠疫自然疫源地,分布于15個省,占國土陸地總面積的15%。自2001年Sanger實驗室發(fā)表第一株鼠疫菌CO92全基因組起,目前已有12株鼠疫菌的完成圖序列被公布,且都進行了基因組注釋工作。由于高通量實驗技術的快速發(fā)展,鼠疫菌各方面研究工作產出了大量數據,對其致病和傳播的理論認識也得到提高。因此重新審視基于過去知識的基因組注釋時發(fā)現:原有信息存在諸多局限性甚至錯誤,而且這些錯誤信息會隨著以同源序列比對為基礎的注釋工作被不斷復制、放大、擴散。研究者曾使用比較基因組學、轉錄組和蛋白基因組學方法對個別鼠疫菌基因組進行了重注釋,但這些注釋側重于基因功能矯正和發(fā)現新基因等方面,數據內容不夠全面。因此需要歸納、整合、完善已有鼠疫菌知識庫,在過去基因組注釋結果基礎上,通過增加新的實驗數據、使用改進的算法對序列進行重新分析、修正可能存在的注釋錯誤,以進一步完善鼠疫菌基因組注釋結果,最終達到系統(tǒng)化加深鼠疫菌功能、生物行為和致病機理認識的目的。數據共享是推動研究知識進步的重要方法。但除了大型公共數據庫外,僅有極少數原核模式生物(例如大腸桿菌)建立了組學數據庫。因此為了給研究者提供更加完整、準確、且易于查用的鼠疫菌注釋信息,有必要在收集整合鼠疫菌多種類型實驗數據和重注釋結果基礎上,建立了針對該物種的跨組學數據庫系統(tǒng)。本研究工作的主要數據來源包括:(1)基因組序列。NCBI提供的12株鼠疫菌的完成圖,它們是注釋的基礎;(2)91001的蛋白組質譜數據。質譜結果是一種格式化、標準化的數據,篩選后可以方便地使用,同時這類數據直接來源于實驗,數據質量高;(3)RNA-seq數據,來源于對91001菌株進行的RNA測序;(4)表達譜數據。來自91001基因芯片實驗,這些數據顯示基因在多種環(huán)境下的表達量,雖然暫時難以在注釋中使用,但是可以為研究人員提供一個參考。此外,補充了文獻中發(fā)表的相關數據。數據重注釋工作包括兩部分,第一部分是數據預處理。首先,結合多組學數據和生物信息軟件、數據庫,采用de novo從頭注釋的方法,共同完成重注釋工作。從基因預測開始,重新鑒定CDS區(qū),修正部分基因的起始位點;結合多個蛋白注釋數據庫,確定基因的功能;對于非基因區(qū),采用預測工具、數據庫和文獻注釋出ncRNA;最后,在全基因組范圍,注釋出重復序列、移動元件注釋工具等。第二部分是數據整理和分析。數據經過分類、篩選,確定數據標準,進行標準化處理過程,完成后進行基因同源性分析、等位基因性分析等。整個過程需要對30多種軟件和數據庫進行本地化和使用�?缃M學數據庫是一個以若干組學數據庫表為基礎的數據庫,不同類型數據之間存在密切的相互聯系。構建數據庫時采用信息系統(tǒng)的處理方法,結合鼠疫菌的生物學特點,確定研究目標后,從研究人員的需求出發(fā),首先進行需求分析,評估系統(tǒng)的可行性,了解功能和業(yè)務需求,初步制定出數據標準,并構建出數據模型;然后根據數據模型,進行組學數據庫的結構設計和功能設計;最終基于MySQL關系數據庫,使用Python Django框架進行web service系統(tǒng)的開發(fā)。結合基因組、蛋白組、轉錄組等多組學數據和上述方法,本研究首先對鼠疫菌91001株進行了全面的重注釋:移除了137個不可靠的編碼區(qū);修正了41個基因起始位點、以及7個假基因和392個假想基因的功能;增加了ncRNA、重復序列、移動元件等特殊基因組元件和基因組片段多樣性的注釋。通過對信息分析算法和軟件等的梳理整合,建立起可應用于其他鼠疫菌的半自動化重注釋工作流程;并進一步將該流程應用于其他11株鼠疫菌完成圖序列。最后,采用關系數據庫和web框架,構建了基于互聯網絡的鼠疫菌跨組學數據庫系統(tǒng)——TODY分析平臺(http://tody.bmi.ac.cn/),方便研究者對重注釋數據進行查詢和使用。在等位基因多樣性處理和Web service服務系統(tǒng)實現的過程中,采用了并行計算技術和分布式調度系統(tǒng),大大減少了計算時間,為下一步大規(guī)模數據分析和處理提供知識儲備和技術支持。本工作融合了生物實驗、生物學知識、生物信息工具和計算機技術,對明確鼠疫菌基因組的結構和功能,揭示其更多的生物學特性具有重要意義。下一步我們將增加更多的相關文獻數據和實驗數據,不斷豐富、充實鼠疫菌組學數據庫;通過實驗進行重注釋結果的準確性驗證;尋找合適的數據挖掘模型,進行深層次的數據分析,構建出鼠疫菌知識庫;不斷完善web service系統(tǒng);移植整個系統(tǒng)到云計算平臺上,為大規(guī)模數據處理服務。
[Abstract]:Jerson Prand (Yersinia pestis) the plague is a deadly risk of bacteria can cause systemic infection, the world had three plague epidemic, deaths of billions of dollars. According to WHO, only 2001-2015 years, the world happened 18 times of plague public safety incidents. China has found 12 typical natural foci of plague, distributed in 15 provinces, land accounted for the total land area of 15%. since 2001, Sanger published the first laboratory strains of Yersinia pestis CO92 genome, there are 12 strains of Yersinia pestis complete graph sequence is published, and the genome annotation work. Due to the rapid development of high-throughput experimental techniques, various aspects of Yersinia pestis study on the work output of a large amount of data, the pathogenic and the spread of the theory has also been improved. Therefore re-examine past knowledge discovery based on genome annotation: the original information of limitations Even wrong, but these error messages will with homology based annotation work by continuous replication, amplification, diffusion. Researchers have used comparative genomics on individual Y.pestis genome re annotation methods transcriptome and protein genome, but these comments focused on gene function correction and discovery of new genes so, the data content is not comprehensive enough. So we need induction, integration, improve the existing knowledge base of Yersinia pestis genome annotation in the past, on the basis of the results, by adding new experimental data, using the improved algorithm to analysis the sequence, correction of annotation errors may exist, in order to further improve the Y.pestis genome annotation, eventually to deepen the knowledge of biological function of Yersinia pestis, behavior and pathogenic mechanism. The data sharing is the important method of promoting the progress of knowledge. But Large public database, only a handful of prokaryotic organisms (e.g. Escherichia coli) established proteomics database. So in order to provide researchers a more complete, accurate, and easy to check with the plague annotation information, it is necessary to integrate various types of Yersinia pestis in the collection of experimental data and comments on the basis of the results, the establishment of for the cross species genomics database system. Including the main data source of this research work: (1) complete Figure 12 Y.pestis genome sequence provided by.NCBI, which is the basis of notes; (2) protein group 91001 spectral data. Mass spectrometry results is a standard data format. After screening, can be conveniently used at the same time, this kind of data directly from the experimental data of high quality; (3) RNA-seq data from RNA sequencing of 91001 strains; (4) expression data from 91001 microarray experiments, these data Display the amount of gene expression in a variety of environments, although temporarily difficult to use in a comment, but can provide a reference for the researchers. In addition, add the relevant data published in the literature. Data annotation work includes two parts, the first part is the data preprocessing. Firstly, combining data and biological information software. Multi group database, using the de method of de novo novo notes, to complete the re annotation work. From the gene prediction, re identification of CDS area, start site correction part gene; combining multiple protein annotation database, determine the function of genes; for non gene prediction using tools, database and document annotation ncRNA; finally in whole genome annotation, and a repeat, mobile element annotation tool. The second part is the collation and analysis of data. The data after classification, screening, to determine the data standard, standard treatment The process, after the completion of homologous analysis, allelic analysis. The whole process takes the localization and use of 30 kinds of software and database. The database is a cross group with several groups of database table based database, there is a close tie between different types of data processing methods to build the database. The information system, combined with the biological characteristics of Yersinia pestis, determine the research goal, starting from the needs of researchers, first needs analysis, feasibility evaluation system, understand the functions and business needs, develop a preliminary data standard, and constructs the data model; then based on the data model, structure design and functional genomics the design of the database; MySQL based relational database, the development of web service system using Python Django framework. With the genome, proteome, transcriptome etc. The data and the method of group learning, this study first of 91001 strains of Yersinia pestis were re annotation comprehensive: removed the 137 unreliable encoding region; modifying 41 gene start sites, and 7 pseudogenes and 392 hypothetical genes; increased ncRNA, repeat, note mobile components etc. Special genomic components and genomic DNA diversity. Through combing the integration of information analysis algorithm and software, establish a semi automated re annotation process can be applied to other Yersinia pestis; and further the process for the other 11 strains of Yersinia pestis sequences. Finally, the relational database and web frame construction the plague of Internet based on cross omics database system -- TODY analysis platform (http://tody.bmi.ac.cn/), to facilitate researchers to query and use of annotation data. In allelic diversity processing and W The process of implementation of the EB service services system, using parallel computing technology and distributed scheduling system, greatly reduces the calculation time, providing knowledge and technical support for large-scale data analysis and processing. The next step of this work combines biological experiments, biological knowledge, bioinformatics tools and computer technology, the structure and function of clear plague bacterial genome, has important significance to reveal more of its biological characteristics. The next step will be to add more relevant literature data and experimental data, and constantly enrich and enrich the Yersinia pestis proteomics database; to verify the accuracy of the experimental results of the re annotation; find suitable data mining model, conducted in-depth data analysis, construction a plague of knowledge; constantly improve the web service system; transplantation of the whole system to the cloud computing platform, for large-scale data processing services.

【學位授予單位】:中國人民解放軍軍事醫(yī)學科學院
【學位級別】:博士
【學位授予年份】:2016
【分類號】:R378
,

本文編號:1707930

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/yixuelunwen/jichuyixue/1707930.html


Copyright(c)文論論文網All Rights Reserved | 網站地圖 |

版權申明:資料由用戶515d2***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com