細菌必需基因的預(yù)測及進化特征的分析
本文選題:必需基因 + 組成特征。 參考:《電子科技大學》2016年碩士論文
【摘要】:必需基因在細菌生存中扮演了一個相當重要的角色,其編碼的蛋白質(zhì)保證了細菌的正常生存和繁殖。在確定了致病菌的必需基因之后,我們可以將其當作治病藥物的靶標,從而達到治療疾病的效果;細菌必需基因的理論研究還有助于我們理解生命的起源和進化。所以,預(yù)測細菌的必需基因越來越成為生物信息學的研究重點。在預(yù)測細菌必需基因的方法中,實驗的方法無疑是最準確的,但是實驗周期長,操作麻煩,并且花費巨大,所以到目前為止只有很少的菌種的必需基因被確定出來,因此理論的方法越來越受到重視。本文就以細菌的必需基因為主要的研究對象,采用基于組成特征的理論方法來預(yù)測細菌的必需基因。我們首先根據(jù)注釋文件從大腸桿菌的基因組序列中提取出其組成特征。然后用支持向量機(SVM)和主成分回歸(PCR)的方法對組成變量進行分類處理,并用曲線下面積AUC的值來衡量分類器的效果。這也是第一次將主成分回歸的方法用于細菌必需基因的預(yù)測。得出SVM的AUC為0.83,PCR結(jié)果為0.87。接著我們又對兩種方法進行改進,在支持向量機方法之前,將組成變量進行特征分析(ttSVM),篩除必需基因和非必需基因沒有明顯差異的變量。對于主成分回歸,加上了核函數(shù)(KPCR),提高了其對非線性特征的分類能力。改進后,ttSVM結(jié)果最高達0.87,KPCR則為0.84。接著我們將其他所有的已經(jīng)實驗確定必需基因的物種用該四種方法處理,AUC最高達到0.95。最后,我們用AUC大于0.8的物種,建立預(yù)測模型,構(gòu)建了一個免費的網(wǎng)上服務(wù)IBEG(http://cefg.uestc.edu.cn/ibeg/),利用該服務(wù),研究人員不但可以運用不同的方法預(yù)測未知基因的必需性,也可以對比不同方法的優(yōu)劣。此外,我們還從功能性基因和水平轉(zhuǎn)移基因兩方面,分別對不同物種的必需基因、高密碼子使用基因以及高表達基因進行了對比分析。在功能性基因中,必需基因所占的比例最多,說明必需基因中具有功能的基因比較多,功能越是對生命體重要的基因,進化越保守;在水平轉(zhuǎn)移基因中,必需基因所占的比例也是最多,說明必需基因的功能中有一些管家基因,從而容易發(fā)生水平轉(zhuǎn)移。綜上所述,本文在組成特征上對細菌必需基因的預(yù)測做了新方法的處理,增加了新的組成特征,并對其在進化方面做了的研究。但是還有一些問題,需要進一步深入研究,并進一步完善。
[Abstract]:Essential genes play a very important role in bacterial survival, and the proteins they encode guarantee the normal survival and reproduction of bacteria. After we have identified the necessary genes of pathogenic bacteria, we can use them as targets of medicine to cure diseases, and the theoretical study of essential genes of bacteria can also help us to understand the origin and evolution of life. Therefore, predicting the essential genes of bacteria has become the focus of bioinformatics. Of the methods used to predict bacterial essential genes, the experimental method is undoubtedly the most accurate, but it is so long, cumbersome and costly that only a few essential genes have been identified so far. Therefore, more and more attention has been paid to the theoretical method. In this paper, the essential genes of bacteria were used as the main research object, and the essential genes of bacteria were predicted by using the theory method based on component characteristics. We first extracted the composition of Escherichia coli from the genome sequence according to the annotated document. Then support vector machine (SVM) and principal component regression (PCR) are used to classify the component variables, and the effect of the classifier is evaluated by the value of AUC under the curve. This is the first time that the principal component regression method has been used to predict bacterial essential genes. The AUC of SVM is 0. 83% and the PCR result is 0. 87. Then we improve the two methods. Before the support vector machine (SVM) method, the component variables are analyzed by feature analysis to screen the variables which have no obvious difference between the essential gene and the non-essential gene. For principal component regression, kernel function KPCRN is added to improve its ability to classify nonlinear features. The result of the improved vector machine (SVM) is 0.87kPCR and 0.84respectively. We then treated AUC with all other species that had experimented with essential genes to a maximum of 0. 95. Finally, we built a prediction model for species with AUC greater than 0.8, and we built a free online service, IBEGEGG: r / cefg.uestc.edu.cnr.ibegrr, which allows researchers not only to use different methods to predict the need for unknown genes. It is also possible to compare the advantages and disadvantages of different methods. In addition, we compared the essential genes, high codon usage genes and high expression genes in different species from functional genes and horizontal transfer genes. Among functional genes, essential genes account for the largest proportion, indicating that there are more functional genes in essential genes, and the more important genes for life, the more conservative evolution is; in horizontal transfer genes, The proportion of essential genes is also the highest, indicating that there are some housekeeping genes in the function of essential genes, which makes it easy to transfer horizontally. To sum up, a new method is proposed to predict the essential genes of bacteria in terms of composition characteristics, and the new characteristics are added, and the evolution of these genes is also studied. However, there are still some problems that need to be further studied and further improved.
【學位授予單位】:電子科技大學
【學位級別】:碩士
【學位授予年份】:2016
【分類號】:Q933
【相似文獻】
中國期刊全文數(shù)據(jù)庫 前3條
1 葉遠濃;郭鋒彪;;微生物必需基因的理論研究現(xiàn)狀[J];遺傳;2012年04期
2 沈露露;杜敏;林興鳳;蔡婷;王大勇;;嗅覺神經(jīng)元AWA功能必需基因以胰島素信號依賴的方式調(diào)控秀麗線蟲的衰老(英文)[J];Neuroscience Bulletin;2010年02期
3 ;[J];;年期
中國重要會議論文全文數(shù)據(jù)庫 前2條
1 張春霆;;細菌必需基因研究與最小基因組[A];第五屆全國生物信息學與系統(tǒng)生物學學術(shù)大會論文集[C];2012年
2 郭鋒彪;寧綠文;黃健;林昊;張會雄;;新洋蔥伯克霍爾德氏菌AU-1054菌株的三條染色體上必需基因的異常分布[A];中國的遺傳學研究——遺傳學進步推動中國西部經(jīng)濟與社會發(fā)展——2011年中國遺傳學會大會論文摘要匯編[C];2011年
中國博士學位論文全文數(shù)據(jù)庫 前2條
1 葉遠濃;細菌必需基因團簇模型及最小基因集構(gòu)建[D];電子科技大學;2015年
2 林巖;微生物必需基因數(shù)據(jù)的分析[D];天津大學;2010年
中國碩士學位論文全文數(shù)據(jù)庫 前4條
1 林丹;多種微生物功能基因的預(yù)測和分析[D];電子科技大學;2014年
2 鄧炎炎;細菌必需基因的預(yù)測及進化特征的分析[D];電子科技大學;2016年
3 羅森;細菌必需基因自訓(xùn)練算法的研究及實現(xiàn)[D];電子科技大學;2016年
4 竇運濤;原核生物基因識別程序ZCURVE 1.02的研發(fā)和微生物必需基因的分析[D];天津大學;2005年
,本文編號:2008980
本文鏈接:http://sikaile.net/kejilunwen/jiyingongcheng/2008980.html