Shewanella loihica PV-4基因組蛋白質(zhì)編碼基因重注釋
發(fā)布時(shí)間:2018-06-28 08:24
本文選題:基因預(yù)測 + 基因組重注釋; 參考:《山東師范大學(xué)》2016年碩士論文
【摘要】:對原核生物基因組蛋白編碼基因的預(yù)測工作已經(jīng)持續(xù)了近30年。由于缺少內(nèi)含子,人們一直以為原核生物基因預(yù)測相對簡單,然而,越來越多的研究表明不同的基因預(yù)測算法得到的基因預(yù)測結(jié)果差別較大,造成預(yù)測結(jié)果假陽性和假陰性的不斷積累,導(dǎo)致生物信息數(shù)據(jù)庫中蛋白編碼基因普遍存在錯(cuò)誤注釋,影響了數(shù)據(jù)庫的使用質(zhì)量,甚至?xí)䦟?dǎo)致錯(cuò)誤研究結(jié)論的產(chǎn)生。因此,本課題通過將原核生物基因組過注釋蛋白質(zhì)編碼基因重注釋算法與基因從頭預(yù)測算法結(jié)合,提出一種原核生物基因組蛋白質(zhì)編碼基因重注釋算法,并實(shí)際應(yīng)用于在生物能源和環(huán)境治理中具有重要應(yīng)用的Shewanella loihica PV-4菌株基因組中,最終得到了1個(gè)過注釋基因和30個(gè)欠注釋基因,基于功能已知的蛋白質(zhì)編碼基因得到的預(yù)測效率評價(jià)指數(shù)Ac、MCC、AUC分別為99.93%、0.9986和0.9999;贐LAST、COG等方法對預(yù)測得到的30個(gè)新基因進(jìn)行功能預(yù)測,有6個(gè)欠注釋蛋白質(zhì)編碼基因得到明確的生物學(xué)功能,2個(gè)欠注釋基因歸為COG分類中的“R”類。在6個(gè)有明確生物學(xué)功能的欠注釋基因中,有2個(gè)磷酸核糖甘氨酰胺轉(zhuǎn)甲;2,2個(gè)葡萄糖-1-磷酸胸苷酰轉(zhuǎn)移酶,膜蛋白和轉(zhuǎn)座酶各1個(gè),這些基因可能在離子交換和蛋白修飾等方面起到重要作用。進(jìn)一步分析表明本文構(gòu)建的重注釋算法準(zhǔn)確、可靠。在此基礎(chǔ)上,將該算法拓展應(yīng)用于其它9種希瓦氏菌基因組,得到了64個(gè)過注釋基因和1036個(gè)欠注釋基因,進(jìn)一步的功能分析發(fā)現(xiàn)有261個(gè)欠注釋基因具有明確生物學(xué)功能,有259個(gè)欠注釋基因具有COG功能分類。在有明確功能的261個(gè)欠注釋基因中,“transposase”(轉(zhuǎn)座酶)類居多,有123個(gè),約占明確功能基因總數(shù)的47%!癷ntegrase”(整合酶)類有16個(gè),“dehydrogenase”(脫氫酶類)有5個(gè),“cytochrome C”細(xì)胞色素C類有3個(gè)等等,這些功能基因在離子交換和信號傳導(dǎo)等方面起到不可或缺的作用。在259個(gè)具有COG分類的欠注釋基因中,有182個(gè)新基因與細(xì)胞色素C相關(guān),表明這些基因與離子傳遞及蛋白修飾相關(guān);30個(gè)新基因與鞭毛相關(guān)基因有關(guān),表明這些基因與細(xì)胞運(yùn)動(dòng)密切相關(guān);48個(gè)新基因與趨化蛋白相關(guān),表明這些基因與細(xì)胞運(yùn)動(dòng),信號傳導(dǎo)密切相關(guān)。因而,本文對S.loihica PV-4菌株和部分Shewanella菌的深入研究提供了可靠的數(shù)據(jù)支持,也為今后原核生物基因組注釋提供了新思路。
[Abstract]:The prediction of prokaryotic genome protein coding genes has been going on for nearly 30 years. Because of the lack of introns, people always think that prokaryote gene prediction is relatively simple. However, more and more studies show that different gene prediction algorithms have different gene prediction results. The accumulation of false positive and false negative results leads to the misinterpretation of protein coding genes in the biological information database, which affects the quality of the use of the database, and even leads to the production of the wrong research conclusions. Therefore, a prokaryotic genome protein coding gene reannotation algorithm is proposed by combining the reannotation algorithm of prokaryotic genome over-annotated protein coding gene with the gene ab initio prediction algorithm. It was applied to the genome of Shewanella loihica PV-4, which has important applications in bioenergy and environmental management. Finally, one over-annotated gene and 30 underannotated genes were obtained. The predictive efficiency evaluation index (AUC) based on protein coding genes with known function was 99.93 and 0.9986, respectively. Based on BLASTG-COG and other methods, the predicted 30 new genes were predicted. Six unannotated protein coding genes had clear biological functions, and two under-annotated genes were classified as "R" in COG classification. Among the 6 underannotated genes with specific biological functions, there were two phosphoglycosaminidase 2,2 glucose-1-phosphothymidine transferase, one membrane protein and one transposyltransferase, respectively. These genes may play an important role in ion exchange and protein modification. Further analysis shows that the algorithm is accurate and reliable. On this basis, the algorithm was extended to the genome of 9 other strains of Shiva, and 64 genes and 1036 unannotated genes were obtained. Further functional analysis showed that there were 261 unannotated genes with definite biological function. There are 259 underannotated genes with COG functional classification. Of the 261 unannotated genes with specific functions, "transposase" (transposing enzymes) were the most common, accounting for about 47.7% of the total number of specific functional genes. There were 16 "integrase" and 5 "dehydrogenase" genes. There are three kinds of cytochrome C in "cytochrome C" and so on. These functional genes play an indispensable role in ion exchange and signal transduction. Of the 259 under-annotated genes with COG classification, 182 new genes were associated with cytochrome C, indicating that these genes were related to ion transport and protein modification, and 30 new genes were associated with flagellate-related genes. These genes are closely related to cell movement and 48 new genes to chemoattractant proteins, indicating that these genes are closely related to cell movement and signal transduction. Therefore, this paper provides reliable data for the further study of S.loihica PV-4 and some Shewanella strains, and provides a new idea for the annotation of the genome of prokaryotes in the future.
【學(xué)位授予單位】:山東師范大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2016
【分類號】:Q78
,
本文編號:2077360
本文鏈接:http://sikaile.net/kejilunwen/jiyingongcheng/2077360.html
最近更新
教材專著