原核生物調(diào)控模體和調(diào)節(jié)子預(yù)測算法研究

發(fā)布時間：2018-05-28 02:23

本文選題：調(diào)控模體 + 調(diào)節(jié)子預(yù)測��；參考：《山東大學(xué)》2014年博士論文

【摘要】：生物信息學(xué)是近年來快速發(fā)展的一門交叉學(xué)科,它綜合了生物、數(shù)學(xué)和計算機等領(lǐng)域的知識來進行生物數(shù)據(jù)的分析和生命現(xiàn)象的研究.序列分析是生物信息學(xué)的一個重要組成部分,其中DNA序列模體預(yù)測一直是生物信息學(xué)中的一個重要研究問題,尤其是轉(zhuǎn)錄因子結(jié)合位點的預(yù)測,既具有重要的生物意義,又具有算法設(shè)計上的難度.本論文主要研究的問題為原核生物基因表達調(diào)控模體和調(diào)節(jié)子的預(yù)測算法. 基因需要表達為相應(yīng)的蛋白質(zhì)才能發(fā)揮生物功能,并且需要針對不同自身與外界環(huán)境,對表達做出調(diào)控.原核生物的表達調(diào)控主要是通過RNA聚合酶和調(diào)控蛋白之間的相互作用實現(xiàn).調(diào)控蛋白能夠識別出基因組DNA序列上特定的序列片段,并與之結(jié)合,起到調(diào)控作用,這些特定序列稱為調(diào)控蛋白結(jié)合位點.因此在基因組中不但包含了編碼蛋白質(zhì)和RNA的基因序列,還包含了調(diào)節(jié)基因表達的調(diào)控序列.同一調(diào)控蛋白的結(jié)合位點的長度一般相同,并具有較高的序列保守性,這種序列的保守模式,稱為一個cis-調(diào)控模體.在原核生物中,基因組上多個連續(xù)的基因往往構(gòu)成一個操縱子,能夠共同轉(zhuǎn)錄；單個基因也可看作操縱子的特殊類型.被同一調(diào)控蛋白所調(diào)控的操縱子的集合,稱為一個調(diào)節(jié)子. 在這篇論文中,我們首先對調(diào)控模體的模型表示和預(yù)測算法做了簡要介紹.在已有模體預(yù)測算法的基礎(chǔ)上,結(jié)合原核生物全基因組中調(diào)控結(jié)合位點的分布特征,我們設(shè)計了對所預(yù)測模體的生物功能顯著性進行考量的方法,能夠?qū)λA(yù)測出的模體進行準確的篩選；利用模體信息量和保守性特征進行模體的相似性分析和聚類分析；利用超幾何分布等統(tǒng)計工具分析模體在全基因組上的共存在特征.這一系列的方法構(gòu)成了模體預(yù)測分析工具包BoBro2.0,相應(yīng)軟件可通過http://code.google.com/p/bobro/免費下載使用. 結(jié)合模體預(yù)測與系統(tǒng)發(fā)生足跡法,我們設(shè)計了全基因組調(diào)節(jié)子預(yù)測的新方法.系統(tǒng)發(fā)生足跡法使我們能夠從同源基因的調(diào)控區(qū)域中發(fā)現(xiàn)調(diào)控模體,然而這些結(jié)果往往具有非常高的假陽性.為了克服這個問題,我們設(shè)計了基于二部圖的模體的相似性比較方法,能夠?qū)λ心ｓw進行初步篩選,并產(chǎn)生了反映操縱子間共調(diào)控關(guān)系的得分,即如果兩個操縱子之間具有較高的得分,那么它們屬于同一個或多個調(diào)節(jié)子的可能性較大.我們只保留了能夠產(chǎn)生較高得分的模體,用來構(gòu)造模體相似性圖,其中以單個模體作為點,以較顯著的相似性得分做邊,整個圖反映出所預(yù)測出的模體之間的相似性關(guān)系.通過對已知的調(diào)節(jié)子所對應(yīng)的圖中的點集進行分析,我們發(fā)現(xiàn)由這些點集所導(dǎo)出的子圖比原圖具有更高的邊密度和聚類系數(shù),因而能夠反映出原核生物調(diào)節(jié)子的特征.利用這一發(fā)現(xiàn),通過設(shè)計聚類算法,我們從圖中獲得了對應(yīng)真實調(diào)節(jié)子的操縱子集合.通過與其它兩種能夠反映共調(diào)控關(guān)系的分數(shù)的比較,我們設(shè)計的方法更加準確反映共調(diào)控關(guān)系；并且由于我們以模體作為點來預(yù)測調(diào)節(jié)子,很好的解決了調(diào)節(jié)子之間的交集會使聚類過程不準確的問題,從而更準確預(yù)測調(diào)節(jié)子.我們的預(yù)測流程完全基于基因組序列數(shù)據(jù),不需要過多的生物注釋信息作為輔助,這對于新測序出的基因組具有更重要的使用價值. 為了方便生物學(xué)家使用我們設(shè)計的算法和工具,我們開發(fā)了以操縱子數(shù)據(jù)為核心的線上數(shù)據(jù)庫DOOR2.0其中包含了2072個完全測序的原核生物基因組的操縱子結(jié)構(gòu),而且具有基因功能注釋和經(jīng)過實驗驗證的調(diào)控蛋白結(jié)合位點信息.與發(fā)表于2009年的之前版本相比,DOOR2.0具有一些列新的特征,(i)包含了來自于實驗驗證或者基于RNA-seq數(shù)據(jù)計算預(yù)測出的250000個轉(zhuǎn)錄單元結(jié)構(gòu),提供了操縱子的動態(tài)功能展示；(ii)整合了以操縱子為中心的數(shù)據(jù)資源,不僅對每個涉及的基因組提供操縱子結(jié)構(gòu),而且有功能和調(diào)控信息,例如cis-調(diào)控因子結(jié)合位點,啟動子和終止子結(jié)構(gòu)；(iii)對用戶提供的基因組進行操縱子預(yù)測的高效網(wǎng)絡(luò)服務(wù)；(iv)使用直觀的基因組瀏覽器對用戶選擇的數(shù)據(jù)進行可視化展示；(v)類似于Google搜索的基于關(guān)鍵詞的搜索引擎,可以從數(shù)據(jù)庫中快速查找所需的信息.數(shù)據(jù)庫會根據(jù)測序數(shù)據(jù)的發(fā)布進行更新,可通過http://csbl.bmb.uga.edu/DOOR/進行訪問,所有數(shù)據(jù)和功能均免費提供給用戶.最后,利用比較基因組學(xué)的種種方法和我們的模體分析工具,我們對梭狀芽孢桿菌的40個物種進行了系統(tǒng)的分析,尤其注重與生物質(zhì)降解相關(guān)的基因和功能.通過這些研究,不僅做出了有生物研究價值的發(fā)現(xiàn),也驗證了我們開發(fā)的方法的實用價值.
[Abstract]:Bioinformatics is a rapid development in recent years. It combines the knowledge of biological, mathematical and computer fields to analyze biological data and study the life phenomenon. Sequence analysis is an important part of bioinformatics. The prediction of DNA sequence model body is always an important part of bioinformatics. The research problem, especially the prediction of the transcription factor binding site, has both important biological significance and the difficulty of algorithm design. The main problem in this paper is the prediction algorithm of modulo body and regulator for gene expression in prokaryotes.
The gene needs to be expressed as the corresponding protein to play a biological function, and the expression needs to be regulated for different self and external environment. The regulation of the expression of prokaryotes is realized mainly through the interaction between RNA polymerase and regulatory protein. In combination with it, these specific sequences are called regulatory protein binding sites. Therefore, the genome contains not only the sequence of genes encoding proteins and RNA, but also the regulatory sequences that regulate the expression of genes. The length of the binding site of the same regulatory protein is the same and has a higher sequence conservatism. The conservative model of a sequence is called a cis- regulatory model. In the prokaryotes, a number of successive genes in the genome often constitute an operon, which can be transcribed together; a single gene can also be seen as a special type of the operon. The aggregation of the operon controlled by the same regulatory protein is called a regulator.
In this paper, we first briefly introduce the model representation and prediction algorithm of the regulated model body. On the basis of the existing model body prediction algorithm, combined with the distribution characteristics of the regulated binding sites in the whole genome of the prokaryotes, we design a method to estimate the significance of the biological power of the predicted model body, which can be predicted. The model body is screened accurately, the model body similarity analysis and cluster analysis are carried out using the model body information quantity and conservatism characteristics. The common characteristics of the model body in the whole genome are analyzed by the statistical tools such as hypergeometric distribution. This series of methods constitute the model body prediction and analysis toolkit BoBro2.0, and the corresponding software can be used through the http //code.google.com/p/bobro/: free download and use.
We designed a new method to predict the whole genome by combining the model body prediction and the systematic footprint method. The systematic footprint method enables us to discover the modulo bodies from the control regions of the homologous genes. However, these results often have very high false positive results. In order to overcome this problem, we designed the model based on the two graph. The similarity comparison method of the body can make a preliminary screening of all the modules and produce a score reflecting the co regulation relationship between the operators, that is, if there is a higher score between the two operon, then they are more likely to belong to the same or multiple regulators. A pattern of structural similarity, in which a single model body is used as a point, with a more significant similarity score, and the whole graph reflects the similarity relation between the predicted models. By analyzing the set of points in the graph corresponding to the known regulator, we find that the subgraphs derived from these points have a higher edge density than the original graph. The degree and the clustering coefficient can reflect the characteristics of the prokaryotes regulator. By using this discovery, we obtain the operon set corresponding to the real regulator by designing the clustering algorithm. By comparing with the other two kinds of scores that can reflect the co regulation relationship, our design method is more accurate to reflect the common regulation and control. And because we predict the regulator with the model body as a point, it is very good to solve the problem that the intersection of the regulators will make the clustering process inaccurate, so that the regulator is more accurately predicted. Our prediction process is based on the genome sequence data and does not need too much raw material annotation information as a supplement, which is for the new sequencing. The genome has a more important use value.
In order to facilitate the biologists to use the algorithms and tools we design, we developed an online database DOOR2.0 based on the core of the operon data, which contains the operon structure of the genome of 2072 completely sequencing prokaryotes, and has the gene function annotation and the experimental verification of the regulatory protein binding site information. Compared with previous versions of 2009, DOOR2.0 has some new features, and (I) contains 250000 transcriptional unit structures derived from experimental validation or based on RNA-seq data computing, providing a dynamic functional display of the operon; (II) integration of data resources centered on the operon, not only for each involved genome. The operon structure, and has functional and regulatory information, such as the cis- regulatory factor binding site, promoter and terminator structure; (III) efficient network services for the user's genome for operon prediction; (IV) visualizing user selected data using an intuitive genome browser; (V) similar to Google search The keyword based search engine can quickly find the information needed from the database. The database will be updated according to the publication of the sequencing data, and can be accessed through http://csbl.bmb.uga.edu/DOOR/. All data and functions are provided free of charge to the user. Finally, the various methods of comparative genomics and our model body are used. Analysis tools, we systematically analyzed 40 species of Clostridium spore, focusing on genes and functions related to biodegradation, which not only made the discoveries of biological research value, but also proved the practical value of the methods we developed.
【學(xué)位授予單位】：山東大學(xué)
【學(xué)位級別】：博士
【學(xué)位授予年份】：2014
【分類號】：Q811.4

【共引文獻】

相關(guān)期刊論文前10條

1 Xundou Li;Mindi Zhao;Menglin Li;Lulu Jia;Youhe Gao;;Effects of Three Commonly-used Diuretics on the Urinary Proteome[J];Genomics,Proteomics & Bioinformatics;2014年03期

2 陳清利;畢勝男;于家峰;;基于密碼子偏好特征的原核基因組多拷貝基因序列分析[J];德州學(xué)院學(xué)報;2014年06期

3 丁秀蕾;張艷凱;榮霞;張開軍;趙冬曉;洪曉月;;基于wsp基因的葉螨體內(nèi)Wolbachia株系的多樣性與重組分析[J];應(yīng)用昆蟲學(xué)報;2013年02期

4 黃麗娟;伊珍珍;林曉鳳;;原生生物基因重復(fù)研究進展[J];華南師范大學(xué)學(xué)報(自然科學(xué)版);2013年05期

5 吳又多;付友思;齊高相;陳麗杰;白鳳武;;基于果糖與葡萄糖不同混合比例的丙酮丁醇發(fā)酵[J];化工進展;2014年06期

6 MA Qin;CHEN Xin;LIU Chao;MAO XiZeng;ZHANG HanYuan;JI Fei;WU ChunGuo;XU Ying;;Understanding the commonalities and differences in genomic organizations across closely related bacteria from an energy perspective[J];Science China(Life Sciences);2014年11期

7 Eudes GV Barbosa;Flavia F Aburjaile;Rommel TJ Ramos;Adriana R Carneiro;Yves Le Loir;Jan Baumbach;Anderson Miyoshi;Artur Silva;Vasco Azevedo;;Value of a newly sequenced bacterial genome[J];World Journal of Biological Chemistry;2014年02期

8 Quan-Jiang Dong;Li-Li Wang;Zi-Bing Tian;Xin-Jun Yu;Sheng-Jiao Jia;Shi-Ying Xuan;;Reduced genome size of Helicobacter pylori originating from East Asia[J];World Journal of Gastroenterology;2014年19期

9 饒瓊;吳慧明;;昆蟲專性內(nèi)共生細菌及其基因組研究進展[J];微生物學(xué)報;2014年07期

10 Bi Ma;Yiwei Luo;Ling Jia;Xiwu Qi;Qiwei Zeng;Zhonghuai Xiang;Ningjia He;;Genome-wide identification and expression analyses of cytochrome P450 genes in mulberry(Morus notabilis)[J];Journal of Integrative Plant Biology;2014年09期

相關(guān)博士學(xué)位論文前10條

1 馬勤;原核生物中調(diào)節(jié)子的研究和預(yù)測[D];山東大學(xué);2010年

2 吳浩;細菌dnaE聚合酶的分化及對細菌基因組進化的影響[D];浙江大學(xué);2012年

3 解少俊;玉米表觀遺傳組的研究[D];中國農(nóng)業(yè)大學(xué);2014年

4 顧敬敏;金黃色葡萄球菌噬菌體GH15及其裂解酶三維結(jié)構(gòu)與分子作用機制研究[D];吉林大學(xué);2014年

5 王彥芹;荒漠植物H~+-PPase基因的系統(tǒng)發(fā)育分析及SaVP1和KcNHX1基因的功能鑒定[D];華中農(nóng)業(yè)大學(xué);2013年

6 邢麗娟;CK1δ/ε對SR motif激酶活性的進化[D];南京大學(xué);2013年

7 陳庚;整合多層次數(shù)據(jù)多方位解析和注釋人類轉(zhuǎn)錄組[D];華東師范大學(xué);2014年

8 張懿璞;轉(zhuǎn)錄因子結(jié)合位點識別問題的算法研究[D];西安電子科技大學(xué);2014年

9 李遜斗;食物中能夠進入淋巴液的蛋白尿蛋白質(zhì)組影響因素及腎癌標志物的研究[D];北京協(xié)和醫(yī)學(xué)院;2014年

10 吳學(xué)龍;PPT1基因調(diào)控植物生長發(fā)育的研究及葉脈特異表達增強子的分離應(yīng)用[D];浙江大學(xué);2013年

相關(guān)碩士學(xué)位論文前10條

1 王劍峰;Paenibacillus mucilaginosus KNP414全基因組測序及分析[D];浙江理工大學(xué);2011年

2 呂羿;小鼠大腦及胰腺組織時序特異性可變剪接轉(zhuǎn)錄本的全基因組分析[D];華中師范大學(xué);2013年

3 宓大云;職業(yè)院校學(xué)生信息管理系統(tǒng)的設(shè)計與實現(xiàn)[D];電子科技大學(xué);2013年

4 黃景;蕭山區(qū)紀委辦公自動化系統(tǒng)設(shè)計與實現(xiàn)[D];電子科技大學(xué);2013年

5 李軍;蘋果SBP轉(zhuǎn)錄因子家族基因的鑒定、系統(tǒng)進化及表達研究[D];西北農(nóng)林科技大學(xué);2013年

6 孫賽劍;杭州計生委電子政務(wù)辦公系統(tǒng)的設(shè)計與實現(xiàn)[D];電子科技大學(xué);2013年

7 宋寶興;功能相似蛋白質(zhì)挖掘及蛋白質(zhì)相互作用預(yù)測平臺[D];西北農(nóng)林科技大學(xué);2013年

8 張杰;重要條件致病菌可移動基因組的研究[D];天津科技大學(xué);2010年

9 張倍倍;高職院校畢業(yè)生就業(yè)管理系統(tǒng)的設(shè)計與開發(fā)[D];電子科技大學(xué);2013年

10 趙雷;基于J2EE架構(gòu)的學(xué)生管理信息系統(tǒng)的設(shè)計與實現(xiàn)[D];電子科技大學(xué);2013年

，

本文編號：1944848

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1944848.html

上一篇：網(wǎng)絡(luò)危機公關(guān)在于“快”——評《危機公關(guān)的網(wǎng)絡(luò)挑戰(zhàn)》
下一篇：基于模式學(xué)習的中文問答系統(tǒng)答案抽取方法

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

原核生物調(diào)控模體和調(diào)節(jié)子預(yù)測算法研究