天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

高通量測序數(shù)據(jù)中病毒基因組的生物信息學(xué)分析方法探索

發(fā)布時(shí)間:2018-07-30 07:47
【摘要】:病毒是一類只能夠在活著的宿主細(xì)胞內(nèi)復(fù)制的感染源。病毒個(gè)體微小、構(gòu)造簡單,除朊病毒(僅由蛋白構(gòu)成)外,病毒均由一種作為遺傳物質(zhì)的核酸(DNA或RNA)與蛋白質(zhì)構(gòu)成。病毒種類多樣,宿主范圍廣,具有細(xì)胞結(jié)構(gòu)的生物均可以是病毒的宿主。病毒基因組作為病毒遺傳信息的載體,是研究病毒的核心數(shù)據(jù)。隨著高通量測序技術(shù)的普及,對病毒基因組進(jìn)行高通量測序已成為研究病毒遺傳、進(jìn)化的主要手段。面對高通量測序產(chǎn)出的大量數(shù)據(jù),就要求生物信息學(xué)分析能夠盡可能多地挖掘出其中病毒基因組的有效信息。本文的研究目的即是探索出不同數(shù)據(jù)類型下,高通量測序數(shù)據(jù)中病毒基因組的生物信息學(xué)分析方法。本文從課題組積累的高通量測序數(shù)據(jù)及分析需求出發(fā),探索了從高通量測序數(shù)據(jù)中挖掘病毒基因組中有效信息的分析方法。本文圍繞病原微生物,分析其測序數(shù)據(jù)中病毒基因組的相關(guān)信息,具體分為兩個(gè)部分:1、細(xì)菌高通量測序數(shù)據(jù)中溶原性噬菌體的分析;2、復(fù)雜測序樣品中的病毒發(fā)現(xiàn)及基因組分析。細(xì)菌高通量測序數(shù)據(jù)中溶原性噬菌體的分析溶原性噬菌體是一類能夠整合入宿主菌基因組中,隨宿主菌的復(fù)制而傳代的病毒。在某些條件的誘導(dǎo)下,也能夠脫離宿主基因組,產(chǎn)生子代噬菌體釋放出來。溶原性噬菌體的復(fù)制特性決定了它具有介導(dǎo)基因水平轉(zhuǎn)移的功能,往往能夠?qū)λ拗骶闹虏⌒援a(chǎn)生重要影響,如德國發(fā)現(xiàn)的腸出血性大腸桿菌O104:H4的主要毒力基因就是由前噬菌體所編碼。本文以分離自足部潰爛病人的72株細(xì)菌基因組測序數(shù)據(jù)為研究對象,以溶原性噬菌體復(fù)制機(jī)制為理論模型,研究發(fā)現(xiàn)新的溶原性噬菌體基因組及其整合特征,為了解噬菌體的生物學(xué)特性及防控高致病性細(xì)菌感染提供基礎(chǔ)。采用生物信息學(xué)軟件與自編程序相結(jié)合的方式進(jìn)行數(shù)據(jù)處理與分析。使用NGS QC Toolkit v2.3.3對原始測序數(shù)據(jù)進(jìn)行質(zhì)量控制,去除短讀長及低質(zhì)量數(shù)據(jù)。針對Ion Torrent平臺(tái)數(shù)據(jù)特點(diǎn),選擇了商業(yè)軟件Newbler v3.0作為數(shù)據(jù)組裝軟件。使用perl腳本編程,搭建前噬菌體預(yù)測分析流程,對組裝得到的contigs進(jìn)行前噬菌體預(yù)測。為得到活躍的前噬菌體基因組,選用兩種輔助拼接工具,ContigScape插件顯示組裝后contigs之間的連接信息,商業(yè)軟件CLC Genomics Workbench 9進(jìn)行序列調(diào)整及拼接結(jié)果檢查。使用實(shí)驗(yàn)室內(nèi)部軟件對contigs進(jìn)行連接。同時(shí)使用RAST在線注釋工具對得到的溶原性噬菌體基因組進(jìn)行注釋。最后,綜合分析得到的溶原性噬菌體基因組結(jié)構(gòu)、整合位點(diǎn)、進(jìn)化關(guān)系等信息,挖掘其中的潛在信息。在72株細(xì)菌基因組數(shù)據(jù)中,共有11株細(xì)菌數(shù)據(jù)中發(fā)現(xiàn)了前噬菌體脫離細(xì)菌基因組進(jìn)行復(fù)制的現(xiàn)象。對能夠脫離細(xì)菌基因組進(jìn)行復(fù)制的噬菌體序列進(jìn)行拼接,共得到14個(gè)活化的前噬菌體全基因組序列,其中11株與目前已知的噬菌體序列同源性很低,為本文新發(fā)現(xiàn)的噬菌體序列。新序列的發(fā)現(xiàn)表明本文研究方法可用于新溶原性噬菌體的發(fā)現(xiàn),增加科研人員對噬菌體的認(rèn)知。分析發(fā)現(xiàn),整合狀態(tài)下噬菌體整合酶基因均與其整合位點(diǎn)緊鄰。溶原性噬菌體的整合位點(diǎn)序列長短特征不一,但表現(xiàn)出與其整合酶具有相關(guān)性。同一整合位點(diǎn)可供多種具有相似整合酶的溶原性噬菌體整合,提供了前噬菌體預(yù)測的新思路。宿主為同一屬內(nèi)的細(xì)菌的溶原性噬菌體具有相似的基因組結(jié)構(gòu)。復(fù)雜測序樣品中的病毒發(fā)現(xiàn)及基因組分析由于病毒分離培養(yǎng)周期長,成功率低,我們常常要對一些復(fù)雜樣品進(jìn)行高通量測序,然后獲取其中的有效病毒信息,這就給數(shù)據(jù)分析帶來了一定的挑戰(zhàn)。課題組近年來開展了使用高通量測序?qū)εR床樣品進(jìn)行病原檢測的工作,要求數(shù)據(jù)分析能夠快速準(zhǔn)確地發(fā)現(xiàn)臨床樣品中的病原。目前單一的生物信息學(xué)軟件不能滿足我們對于復(fù)雜測序樣品的分析需求,鑒于此開發(fā)了分析軟件《高通量測序數(shù)據(jù)病原體歸類分析軟件v1.0》。該軟件能夠?qū)?xì)菌、真菌、原蟲、病毒4種類型的病原進(jìn)行檢測,同時(shí)在應(yīng)對復(fù)雜樣品中已知或未知病毒的發(fā)現(xiàn)工作表現(xiàn)出良好的效果。復(fù)雜樣品中已知病毒的發(fā)現(xiàn),以2016年7月北京發(fā)現(xiàn)的輸入性裂谷熱病例為例。通過使用分析軟件對測序數(shù)據(jù)分析,發(fā)現(xiàn)了大量的裂谷熱病毒序列,確認(rèn)了裂谷熱病毒為致病原,并在第一時(shí)間獲得了該株裂谷熱病毒的全基因組序列。該株裂谷熱病毒與2009年南非發(fā)現(xiàn)的Kakamas株同源性最高,進(jìn)化分析提示該株病毒沒有發(fā)生重組。復(fù)雜樣品中未知病毒的發(fā)現(xiàn),以勐海彈狀病毒的發(fā)現(xiàn)為例。該株病毒分離自云南勐海地區(qū)捕獲的白紋伊蚊,以C6/36細(xì)胞培養(yǎng)后,使用常見病毒引物無法鑒定出是何種病毒。通過對其高通量測序數(shù)據(jù)的分析,排除掉宿主細(xì)胞、其他細(xì)菌、病毒等干擾因素,獲得了該株病毒的全基因組序列。序列分析顯示其為一株新型的彈狀病毒,命名為勐海彈狀病毒,與發(fā)現(xiàn)于秘魯?shù)牧硗鈨芍晡妹綇棤畈《咀顬橄嗨。在對勐海彈狀病毒的基因組分析中,本文還對選取的93株彈狀病毒參考序列進(jìn)行了病毒末端序列分析。發(fā)現(xiàn)其中的45株均具有短反向重復(fù)末端序列的特點(diǎn),分布于不同的屬中。狂犬病毒屬內(nèi)具有非常一致的末端序列“ACGCTTAAC”,而Ephemerovirus、Vesiculovirus、Tibrovirus和Sprivivirus四個(gè)屬的病毒則均有“ACGAAGA”的一致末端序列。病毒基因組的末端序列常常與其基因組復(fù)制相關(guān),其末端序列往往是相對嚴(yán)格的,這表明短反向重復(fù)末端序列很可能是彈狀病毒科病毒基因組的一類特點(diǎn)。綜上,本文在現(xiàn)有病毒基因組分析方法的基礎(chǔ)上,提出了以細(xì)菌測序數(shù)據(jù)分析活化的前噬菌體全基因組及其整合位點(diǎn)的分析方法,能夠用于新溶原性噬菌體發(fā)現(xiàn),為了解溶原性噬菌體提供新知識(shí)。開發(fā)了高通量測序數(shù)據(jù)病原體歸類分析軟件,取得軟件著作權(quán),并在未知病原檢測中發(fā)揮良好的作用。通過數(shù)據(jù)分析發(fā)現(xiàn)了一種新的彈狀病毒,并對彈狀病毒科基因組的末端序列特點(diǎn)做了分析。病毒基因組的分析,仍需針對不同的研究對象及分析需求設(shè)計(jì)分析方法,希望本文的方法及結(jié)論能夠給其他科研人員提供參考和思路。
[Abstract]:A virus is a source of infection that can only be replicated in a living host cell. The virus is small and simple in structure. In addition to prion, the virus is made up of a nucleic acid (DNA or RNA) and protein as a genetic material. The virus is diverse, the host range is wide, and the cell structure organism can be the host of the virus. As the carrier of the genetic information of the virus, the virus genome is the core data of the virus. With the popularization of high throughput sequencing technology, the high flux sequencing of the virus genome has become the main means to study the virus heredity and evolution. The purpose of this paper is to explore the bioinformatics analysis method of viral genome in high throughput sequencing data under different data types. This paper, based on the high throughput sequencing data and analysis requirements accumulated by the group, explored the virus mining from high throughput sequencing data. Analysis of the effective information in the genome. This paper analyzes the related information of the virus genome in the sequencing data around the pathogenic microorganism, which is divided into two parts: 1, the analysis of the lytic phage in the high throughput sequencing data of bacteria; 2, the virus occurrence and genome analysis in the complex sequencing samples. Primary phage analysis of lytic phage is a kind of virus that can be integrated into the genome of host bacteria and is transmitted with the replication of host bacteria. Under some conditions, the phage can also be released from the host genome and produce the progeny phage. The replication characteristics of the lytic phage determine that it mediates gene level transfer. Function can often have an important effect on the pathogenicity of the host bacteria. For example, the main virulence gene of Escherichia coli O104:H4 found in Germany is encoded by the former phage. In this paper, the genome sequencing data of 72 bacterial strains isolated from the patients with self foot ulceration were studied, and the lysogen phage replication mechanism was used as the theoretical model. In order to solve the biological characteristics of phage and provide the basis for preventing and controlling the infection of highly pathogenic bacteria, the new lytic phage genome and its integration features are found. The data processing and analysis are carried out by the combination of bioinformatics software and self compiled program. NGS QC Toolkit v2.3.3 is used to control the quality of the original sequencing data. According to the characteristics of the Ion Torrent platform, the commercial software Newbler V3.0 is selected as the data assembly software. Using the Perl script programming, the pre phage prediction analysis process is built and the pre phage prediction of the assembled contigs is carried out. Two kinds of auxiliary phage genome are selected for the active pre phage genome. The splicing tool, the ContigScape plug-in displays the connection information between the assembled contigs, the commercial software CLC Genomics Workbench 9 for sequence adjustment and the splicing result check. Use the laboratory internal software to connect the contigs. At the same time, use the RAST online annotation tool to annotate the obtained lytic phage genome. In the data of 72 strains of bacterial genome, 11 strains of bacteriophage have been found in the data of 72 strains of bacterial genome, and the phage sequences that can be replicated from the bacterial genome are found. The whole genome sequence of 14 active phage was obtained. 11 of them have low homology with the known phage sequences, which are the new phage sequences found in this paper. The discovery of the new sequence shows that this method can be used for the discovery of new lytic phage and increase the cognition of the researchers to phage. In the integrated state, the phage integrase gene is closely adjacent to its integration site. The integration site sequence of the lytic phage is different, but it shows the correlation with its integrase. The same integration site can provide a variety of lylygentic phage integration with similar integrase, and provide a new idea for the prediction of the pre phage. The host is the same. The lytic phage of the bacteria in one genus has a similar genome structure. The virus discovery and genome analysis in the complex sequencing samples are long and the success rate is low because of the virus isolation and culture. We often have to sequence some complex samples by high flux and obtain the effective virus information. This brings the data analysis. A certain challenge. In recent years, the team has carried out the work of using high throughput sequencing to detect the pathogens in clinical samples, which requires data analysis to quickly and accurately detect the pathogens in clinical samples. A high throughput sequencing data classification software v1.0>., the software can detect 4 types of pathogens, bacteria, fungi, protozoa, and viruses, and good results in the discovery of known or unknown viruses in complex samples. The discovery of the virus in complex samples was found in Beijing in July 2016. A large number of Rift Valley fever virus sequences were found by analysis software, and the whole genome sequence of the Rift Valley fever virus was obtained at the first time. The Rift Valley fever virus was the most homologous to the Kakamas strain found in South Africa in 2009. The virus has not been reorganized. The discovery of the unknown virus in the complex sample was taken as an example of the discovery of the Menghai elastin virus. The virus was isolated from the Aedes albopictus, captured from the Menghai region of Yunnan, and could not be identified by common virus primers after the culture of the C6/36 cells. The whole genome sequence of the virus was obtained by removing the host cells, other bacteria and viruses. Sequence analysis showed that the virus was a new type of elastovirus, named Menghai elastovirus, which was the most similar to the other two mosquito borne viruses found in Peru. In the genome analysis of the Menghai elastovirus, this article also showed that The analysis of the virus terminal sequence of the selected 93 strains of ironlike viruses was carried out. It was found that 45 of them had the characteristics of short reverse repeating terminal sequences and distributed in different genera. The rabies virus has a very consistent terminal sequence "ACGCTTAAC" and four genera of Ephemerovirus, Vesiculovirus, Tibrovirus and Sprivivirus. The virus genome has a consistent terminal sequence of "ACGAAGA". The terminal sequence of the virus genome is often related to its genome replication and its terminal sequence is often relatively strict. This indicates that the short reverse repeating terminal sequence is very likely to be a kind of characteristics of the genome of the virus family virus. On the basis of the method, an analytical method for the analysis of the whole genome and its integrated site of the pre phage was proposed by the analysis of the bacterial sequencing data. It can be used for the discovery of the new lytic phage and provide new knowledge for the understanding of the lytic phage. The software of the high throughput sequencing data classification analysis software has been developed, and the software copyright is obtained, and the unknown disease is unknown. A new type of projectile virus was found through data analysis, and the characteristics of the terminal sequence of the genome of the family of ironavirus were analyzed. The analysis of the virus genome still needs to be designed and analyzed for different research objects and analysis requirements. It is hoped that the methods and conclusions of this paper can be given to other researchers. Provide reference and ideas.
【學(xué)位授予單位】:中國人民解放軍軍事醫(yī)學(xué)科學(xué)院
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:R373;Q811.4

【參考文獻(xiàn)】

相關(guān)期刊論文 前2條

1 馮燁;劉軍;孫洋;馮書章;;噬菌體最新分類與命名[J];中國獸醫(yī)學(xué)報(bào);2013年12期

2 黎庶;胡福泉;;前噬菌體[J];微生物學(xué)通報(bào);2009年03期

,

本文編號(hào):2154325

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/shoufeilunwen/benkebiyelunwen/2154325.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶d362a***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請E-mail郵箱bigeng88@qq.com