天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

高階馬爾科夫模型在生物發(fā)育樹重建和模體發(fā)現(xiàn)中的應(yīng)用

發(fā)布時間:2019-06-15 16:23
【摘要】:傳統(tǒng)的生物序列分析方法是建立在序列比對基礎(chǔ)之上。而序列比對有其自身的局限:核酸和氨基酸替換矩陣選擇沒有統(tǒng)一的標(biāo)準(zhǔn);對分化程度很高的序列比如基因調(diào)控序列的比對失效;由于時間消耗量大,針對新一代測序技術(shù)產(chǎn)生的海量數(shù)據(jù),基于序列比對的方法已不切實(shí)際。因此在后基因組時代,生物序列分析急需更快速高效的非比對方法。馬爾科夫模型是刻畫隨機(jī)過程的重要模型,在生物序列分析的應(yīng)用有很長的歷史。比如,CpG島識別和基因發(fā)現(xiàn)的很多經(jīng)典方法都使用了馬爾科夫模型。但過去往往是利用低階馬爾科夫模型,本文將討論高階馬爾科夫模型在生物序列分析中的應(yīng)用。主要工作如下:1.馬爾科夫香農(nóng)熵最大化(MME)定階法。馬爾科夫模型在生物序列分析中的應(yīng)用很廣,但是對其階的識別問題關(guān)注較少,一般用Χ2統(tǒng)計(jì)量推斷或者用AIC/BIC信息標(biāo)準(zhǔn)方法識別。針對生物序列比較問題,如果利用高階馬爾科夫模型,則希望序列的信息盡可能多的被表征出來。本文我們首次提出了馬爾科夫香農(nóng)熵最大化(MME)的定階方法。多個數(shù)據(jù)集的測試表明這種方法識別的階比AIC/BIC信息標(biāo)準(zhǔn)法識別的階高,并且在生物序列比較方面有明顯優(yōu)勢。2.一維混沌游戲表示。Jeffrey提出的基于函數(shù)迭代的DNA序列的混沌游戲表示是一種一對一的二維圖形表示方法,它將DNA序列轉(zhuǎn)換成二維平面中的單位正方形區(qū)域的點(diǎn)集,由此將序列中不同長度的多聚體的頻率特異性表現(xiàn)為散點(diǎn)圖的不同區(qū)域的疏密特異性,還能將多聚體的不同層次的組合偏好性體現(xiàn)為散點(diǎn)圖的分形特征。因此DNA序列的混沌游戲表示被廣泛應(yīng)用于DNA序列的特征描述。但是Jeffrey的混沌游戲是為DNA序列量身定做的表示方法,至多只能處理定義在包含尼2個字符的集合上的序列。一維混沌游戲表示是基于類似函數(shù)迭代的一種一對一的數(shù)值表示方法,是將定義于任何有限字符集的符號序列映射為一維數(shù)軸上單位區(qū)間的數(shù)值序列,不僅可以處理DNA序列和RNA序列,還可以應(yīng)用于包含20種氨基酸的蛋白質(zhì)序列,甚至包含26個字母的英文文本序列。除了可視化效果,一維混沌游戲表示繼承了Jeffrey的混沌游戲的其它所有特征。我們首次提出了一維混沌游戲表示的反演公式和用于生物序列七-串表示的結(jié)構(gòu)指數(shù),并討論了一維混沌游戲表示與高階馬爾科夫模型的關(guān)系。應(yīng)用高階馬爾科夫模型的兩個關(guān)鍵問題是階的識別和大規(guī)模參數(shù)的估計(jì)。一維混沌游戲表示的這些性質(zhì)有助于高階馬爾科夫模型的階的識別和參數(shù)估計(jì)。3.進(jìn)化樹重建。利用生物序列構(gòu)建系統(tǒng)發(fā)育樹,傳統(tǒng)的方法是在分子鐘假設(shè)之下對某種基因進(jìn)行比對,根據(jù)核酸或氨基酸替換矩陣獲得基因之間的進(jìn)化距離從而構(gòu)建基因樹。這些基因一般具有相當(dāng)?shù)谋J匦?比如16S rRNA,18S rRNA等等,但是在很多情況下,基于不同基因的基因樹并沒有一致性。由于基于比對針的方法的局限性,出現(xiàn)了很多無比對方法。廣泛應(yīng)用的組分矢量(CV)法是利用固定字長的詞頻作為刻畫基因組或蛋白組的特征向量,其中用到背景概率是利用高階馬爾科夫模型獲得的。受此啟發(fā),我們首次提出直接利用高階馬爾科夫模型表示全蛋白質(zhì)組或者全基因組,將相應(yīng)的轉(zhuǎn)移概率矩陣作為刻畫序列的特征向量。其中階的識別是利用我們新提出的馬爾科夫香農(nóng)熵最大化(MME)定階方法。多個全蛋白質(zhì)組和全基因組數(shù)據(jù)集的結(jié)果證實(shí)了這種非比對的發(fā)育樹重建方法很有效。4.模體發(fā)現(xiàn);蚴荄NA序列中具有遺傳信息的基本單元,而影響和控制基因的轉(zhuǎn)錄和表達(dá)的是轉(zhuǎn)錄因子通過與基因調(diào)控元件(啟動子,增強(qiáng)子,沉默子等)中結(jié)合位點(diǎn)相結(jié)合實(shí)現(xiàn)的,這些結(jié)合位點(diǎn)是相對固定又重復(fù)出現(xiàn)的5-20bp長度的DNA序列模式,稱之為模體。理解基因表達(dá)是生物學(xué)中的重大挑戰(zhàn),而基因調(diào)控元件的識別特別是模體的識別是這個挑戰(zhàn)中的一個重要課題。受Tompa等的方法的啟發(fā),我們提出利用高階馬爾科夫模型的新尼-串法。首先利用高階馬爾科夫模型描述該背景序列集,在背景高階馬爾科夫模型下,確定每個紅串在序列集中的期望頻數(shù)。再由實(shí)際頻數(shù)與期望頻數(shù)的相對偏離率,判斷缸串是來自隨機(jī)背景序列還是來自模體的樣例。我們用多個HT-SELEX數(shù)據(jù)集證實(shí)了這種舡串法的有效性。
[Abstract]:The traditional method of biological sequence analysis is based on the sequence comparison. and the sequence ratio has the limitation that the selection of the nucleic acid and the amino acid substitution matrix is not uniform; the ratio of the sequence with high differentiation degree, such as the gene regulation sequence, is invalid; and due to the large time consumption, the mass data generated by the new generation sequencing technology, The method based on the sequence alignment is impractical. Therefore, in the post-genome era, the biological sequence analysis is in urgent need of a more rapid and efficient non-alignment method. The Markov model is an important model for describing the stochastic process, and has a long history in the application of the biological sequence analysis. For example, many classical methods of CpG island recognition and gene discovery use a Markov model. But in the past, using the low-order Markov model, this paper will discuss the application of the high-order Markov model in the analysis of the biological sequence. The main work is as follows:1. Markov-Shannon entropy-maximizing (MME) order method. The application of the Markov model in the analysis of the biological sequence is very wide, but the problem of the identification of the order is less concerned, and it is generally concluded by using the second statistic or by using the AIC/ BIC information standard method. For a biological sequence comparison problem, if a high-order Markov model is used, it is desirable that the information of the sequence be characterized as much as possible. In this paper, we first put forward the order method of the Markov Shannon Entropy Maximization (MME). Tests on a number of data sets have shown that the method identified by this method has a higher order than the AIC/ BIC information standard method, and has a significant advantage in the comparison of biological sequences. One-dimensional hybrid game representation. the hybrid game representation of the function-iteration-based dna sequence presented by jeffrey is a one-to-one two-dimensional graphical representation method that converts the dna sequence into a set of points in a unit square region in a two-dimensional plane, As a result, the frequency specificity of the multimers of different lengths in the sequence is expressed as the density specificity of different regions of the scattergram, and the combined preference of the different levels of the polymer can be reflected as the fractal characteristic of the scattergram. The hybrid game of the DNA sequence thus represents the characterization of the DNA sequence widely used. But Jeffrey's hybrid game is a custom-made representation of the DNA sequence, and at most, you can only process the sequence that is defined on a set that contains the 1 2 characters. a one-to-one numerical representation method based on the iteration of a similar function is a one-to-one numerical representation method based on a similar function iteration, It can also be applied to a protein sequence containing 20 amino acids, and even an English text sequence containing 26 letters. In addition to the visual effect, one-dimensional hybrid game represents all the other features that have inherited Jeffrey's hybrid game. In this paper, we first put forward the inversion formula of one-dimensional hybrid game and the structural index for the seven-string representation of the biological sequence, and discuss the relation between the one-dimensional hybrid game and the high-order Markov model. Two key problems of applying the high-order Markov model are the identification of the order and the estimation of large-scale parameters. These properties of one-dimensional hybrid game play a role in the identification and parameter estimation of the order of the high-order Markov model. The reconstruction of the tree. The phylogenetic tree is constructed by using a biological sequence, and the traditional method is to construct a gene tree by comparing a certain gene under the hypothesis of a molecular clock, and obtaining a genetic distance between the genes according to a nucleic acid or an amino acid substitution matrix. These genes generally have considerable conservation, such as 16S rRNA, 18S rRNA, and the like, but in many cases, genetic trees based on different genes are not consistent. As a result of the limitations of the method based on the comparison of the needle, a number of unparalleled methods have emerged. The widely used component vector (CV) method is to use the word frequency of fixed word length as the feature vector for describing the genome or proteome, wherein the background probability is obtained by using the high-order Markov model. In this light, we first put forward the direct utilization of the high-order Markov model to represent the whole protein group or the whole genome, and the corresponding transfer probability matrix is used as the feature vector for describing the sequence. The identification of the order is to use the new Markov Shannon entropy maximization (MME) order method. The results of a number of all-protein and all-genome data sets demonstrate that this non-specific development tree reconstruction method is very effective. The phantom was found. The gene is the basic unit with the genetic information in the DNA sequence, and the transcription and expression of the influence and control gene is realized by the combination of the binding site of the gene regulation element (promoter, enhancer, silence, etc.). These binding sites are DNA sequence patterns of 5-20 bp length, which are relatively fixed and repeated, referred to as a phantom. Understanding gene expression is a major challenge in biology, and identification of gene regulatory elements, in particular, is an important subject in this challenge. Inspired by the methods of Tompa et al., we propose a new-series method using the high-order Markov model. First, using the high-order Markov model to describe the background sequence set, in the background high-order Markov model, the desired frequency of each red string in the sequence set is determined. The relative deviation rate of the actual frequency and the desired frequency is then determined, and the cylinder string is judged to be from a random background sequence or a sample from the phantom. We use multiple HT-SELEX data sets to demonstrate the effectiveness of this cross-series method.
【學(xué)位授予單位】:湘潭大學(xué)
【學(xué)位級別】:博士
【學(xué)位授予年份】:2016
【分類號】:Q811.4

【相似文獻(xiàn)】

相關(guān)期刊論文 前10條

1 趙娟;秦玉芳;劉太崗;王軍;;基于一種新型馬爾科夫模型的預(yù)測蛋白質(zhì)亞細(xì)胞位點(diǎn)的方法(英文)[J];上海師范大學(xué)學(xué)報(bào)(自然科學(xué)版);2011年02期

2 ?速F;;應(yīng)用馬爾科夫模型的方法對呼和浩特—五原地震亞帶危險(xiǎn)性估計(jì)[J];華北地震科學(xué);1987年02期

3 陳振頌;李延來;;基于廣義信度馬爾科夫模型的顧客需求動態(tài)分析[J];計(jì)算機(jī)集成制造系統(tǒng);2014年03期

4 陳永;馮元;龐思偉;;基于灰色馬爾科夫模型的傳染病預(yù)測[J];信息與電腦(理論版);2010年02期

5 劉文遠(yuǎn);劉麗云;王常武;王寶文;;基于二階馬爾科夫模型預(yù)測可趨近性靶基因[J];燕山大學(xué)學(xué)報(bào);2012年04期

6 吳金華;戴淼;;基于改進(jìn)算法的灰色馬爾科夫模型的建設(shè)用地預(yù)測[J];安徽農(nóng)業(yè)科學(xué);2010年08期

7 汪可;楊麗君;廖瑞金;齊超亮;周nv;;基于離散隱式馬爾科夫模型的局部放電模式識別[J];電工技術(shù)學(xué)報(bào);2011年08期

8 鄧鑫洋;鄧勇;章雅娟;劉琪;;一種信度馬爾科夫模型及應(yīng)用[J];自動化學(xué)報(bào);2012年04期

9 陳煥珍;;基于灰色馬爾科夫模型的青島市糧食產(chǎn)量預(yù)測[J];計(jì)算機(jī)仿真;2013年05期

10 張延利;張德生;井霞霞;任世遠(yuǎn);;基于無偏灰色馬爾科夫模型的人民幣/美元匯率短期預(yù)測模型[J];陜西科技大學(xué)學(xué)報(bào)(自然科學(xué)版);2011年06期

相關(guān)會議論文 前2條

1 王虎平;李煒;趙志理;;基于灰色馬爾科夫模型的杭州市客流預(yù)測[A];第九屆中國不確定系統(tǒng)年會、第五屆中國智能計(jì)算大會、第十三屆中國青年信息與管理學(xué)者大會論文集[C];2011年

2 鄭亞斌;曹嘉偉;劉知遠(yuǎn);;基于最大匹配和馬爾科夫模型的對聯(lián)系統(tǒng)[A];第四屆全國學(xué)生計(jì)算語言學(xué)研討會會議論文集[C];2008年

相關(guān)博士學(xué)位論文 前2條

1 陳勐;軌跡預(yù)測與意圖挖掘問題研究[D];山東大學(xué);2016年

2 陽衛(wèi)鋒;高階馬爾科夫模型在生物發(fā)育樹重建和模體發(fā)現(xiàn)中的應(yīng)用[D];湘潭大學(xué);2016年

相關(guān)碩士學(xué)位論文 前8條

1 陳瀟瀟;基于馬爾科夫模型的代謝綜合征描述和風(fēng)險(xiǎn)預(yù)測研究[D];山東大學(xué);2015年

2 張勝娜;含有隱變量的高階馬爾科夫模型的理論及應(yīng)用[D];電子科技大學(xué);2014年

3 楊世安;優(yōu)化的灰色馬爾科夫模型在建筑物沉降預(yù)測中的應(yīng)用[D];東華理工大學(xué);2014年

4 張海君;基于馬爾科夫模型的沙漠?dāng)U散和天氣預(yù)測[D];新疆大學(xué);2013年

5 蔡亮亮;改進(jìn)的灰色馬爾科夫模型及其對全國郵電業(yè)務(wù)總量的預(yù)測[D];南京郵電大學(xué);2013年

6 葉t,

本文編號:2500351


資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/shoufeilunwen/jckxbs/2500351.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶39bde***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com
少妇人妻精品一区二区三区| 日本不卡在线视频中文国产| 国产91人妻精品一区二区三区| 精品一区二区三区三级视频 | 冬爱琴音一区二区中文字幕| 在线九月婷婷丁香伊人| 欧美日韩一级黄片免费观看| 亚洲天堂久久精品成人| 亚洲国产成人久久一区二区三区| 激情爱爱一区二区三区| 久久综合狠狠综合久久综合| 国产日韩欧美一区二区| 日韩一区二区三区高清在| 精品国产亚洲免费91| 国产精品大秀视频日韩精品| 国产精品亚洲欧美一区麻豆| 中文字幕日韩精品人一妻| 激情视频在线视频在线视频| 亚洲国产性生活高潮免费视频 | 国产一区在线免费国产一区| 99久久精品视频一区二区| 亚洲欧美精品伊人久久| 中文字幕在线区中文色| 欧美色婷婷综合狠狠爱| 中文久久乱码一区二区| 青青操精品视频在线观看| 日本中文字幕在线精品| 最近最新中文字幕免费| 国产一级精品色特级色国产| 美女露小粉嫩91精品久久久| 欧美区一区二区在线观看| 亚洲少妇一区二区三区懂色| 日韩黄色大片免费在线| 亚洲最新的黄色录像在线| 日韩三级黄色大片免费观看| 黄色国产自拍在线观看| 国内精品伊人久久久av高清| 欧美日韩黑人免费观看| 欧美亚洲综合另类色妞| 99日韩在线视频精品免费| 日韩精品日韩激情日韩综合|