系統(tǒng)發(fā)育樹的和極大似然估計
發(fā)布時間:2018-01-08 21:33
本文關(guān)鍵詞:系統(tǒng)發(fā)育樹的和極大似然估計 出處:《山東大學(xué)》2017年碩士論文 論文類型:學(xué)位論文
更多相關(guān)文章: 系統(tǒng)發(fā)育樹 和極大似然估計 生物信息學(xué) 序列分析
【摘要】:系統(tǒng)發(fā)育分析(phylogenetic analysis)是生物信息學(xué)里的一個重要主題。隨著分子數(shù)據(jù)的不斷積累,人們更加關(guān)注分子數(shù)據(jù)蘊含的信息。一般用核苷酸序列或者蛋白質(zhì)序列構(gòu)建系統(tǒng)發(fā)育樹。常用的方法有最大似然法,最大簡約法和距離矩陣法。最大似然法和最大簡約法是直接根據(jù)序列得到系統(tǒng)發(fā)育樹,距離矩陣法是根據(jù)序列之間的距離得到系統(tǒng)發(fā)育樹,有最少改變的兩個序列被當(dāng)作鄰居。這些方法都是要估計樹的拓?fù)浣Y(jié)構(gòu)和分枝長度,我們希望所有的方法都能得到同樣的樹,但是往往做不到。其中基于概率論的極大似然估計方法更加準(zhǔn)確。但是,極大似然估計法要對每一個可能的拓?fù)浣Y(jié)構(gòu)計算概率。與此同時,隨著位點的增多,待分析的拓?fù)浣Y(jié)構(gòu)數(shù)目龐大。重復(fù)這樣的計算過程,計算量驚人。并且已被證明這是一個NP-hard問題。大多數(shù)情況,人們不能獲得全局最優(yōu)的系統(tǒng)發(fā)育樹估計。但是可以通過啟發(fā)搜索法得到一個較好的估計。其中,B.B.zhou等人用并行算法實現(xiàn)上述搜索過程,提高了搜索速度和所搜范圍。本文探索系統(tǒng)發(fā)育樹分枝長度的估計問題,以和極大似然作為判別準(zhǔn)則,并使用粒子群算法來優(yōu)化分枝長度。在Billera等人建立的系統(tǒng)發(fā)育樹的幾何空間基礎(chǔ)上,即每一個拓?fù)浣Y(jié)構(gòu)可以作為一個象限。假定位點在核苷酸替換過程中是一個馬爾可夫過程。在這樣的假設(shè)下,我們計算所有位點的似然函數(shù)之和,并估計出分枝長度。系統(tǒng)發(fā)育樹對于其他生物信息的研究具有重要意義。為人們探索物種起源以及分子進(jìn)化,進(jìn)而探索基因功能提供依據(jù)。系統(tǒng)發(fā)育分析對于病毒的控制,疾病的診斷具有重要指導(dǎo)意義。所以,探索系統(tǒng)發(fā)育樹估計方法是一項十分有意義的工作。
[Abstract]:Phylogenetic analysis (phylogenetic analysis) is an important topic in bioinformatics. With the continuous accumulation of molecular data, people pay more attention to the information contained in general. The molecular data with nucleotide sequences or protein sequences. Phylogenetic tree construction methods commonly used maximum likelihood method, maximum parsimony and maximum likelihood distance matrix method. Method and the maximum parsimony method is based on the direct sequence phylogenetic tree, distance matrix method is based on the distance between sequences by phylogenetic tree, two sequences are at least change as a neighbor. The method is to estimate the topology and branch length of the tree, we hope that all the methods can get the same the tree, but to do so. The maximum likelihood probability estimation method based on more accurate. However, the maximum likelihood estimation method to every possible topology To calculate the probability. At the same time, with the number of sites increased, the topological structure of the large. Repeat this calculation process and calculation amount is amazing. And it has been proved that NP-hard is a problem. In most cases, people cannot obtain the global optimal phylogenetic tree estimation. But you can get a good estimation through inspiration search method. Among them, B.B.zhou et al use parallel algorithm to realize the search process, improves the search speed and search scope. This paper explores the estimation problem of phylogenetic tree branch length, and the maximum likelihood as the criterion, and the use of particle swarm algorithm to optimize branch length. The geometric space based phylogenetic tree based on Billera. On that every topological structure can be used as a quadrant. False location is a Markov process in the nucleotide substitution process. Under these assumptions, I Are calculated for all sites and the likelihood function, and estimate the branch length. The phylogenetic tree is very important for the research of other biological information. For exploring the origin of species and molecular evolution, and to explore the function of gene. The phylogenetic analysis provides the basis for the control of the virus, which has important guiding significance to the diagnosis of the disease. So, exploration estimation of phylogenetic tree method is a very meaningful work.
【學(xué)位授予單位】:山東大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:Q811.4;O212.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前2條
1 陳兆斌;;拽線法:一個構(gòu)建系統(tǒng)發(fā)育樹的新算法(英文)[J];生物信息學(xué);2013年04期
2 彭軍還;和極大似然估計──一種新的估計準(zhǔn)則[J];桂林冶金地質(zhì)學(xué)院學(xué)報;1994年04期
,本文編號:1398783
本文鏈接:http://sikaile.net/shoufeilunwen/benkebiyelunwen/1398783.html
最近更新
教材專著