基于圖形表示的DNA序列聚類與可靠性分析改進(jìn)
發(fā)布時(shí)間:2018-01-20 04:17
本文關(guān)鍵詞: 圖形表示 聚類分析 系統(tǒng)生成樹 Bootstrap 出處:《浙江理工大學(xué)》2017年碩士論文 論文類型:學(xué)位論文
【摘要】:生物序列的圖形表達(dá)由于具有較好的可視化描述、局部信息表達(dá)等特點(diǎn),已經(jīng)成為研究生物序列的一種重要手段。利用圖形表達(dá)生物序列并結(jié)合聚類分析可以有效地研究序列間的進(jìn)化聯(lián)系。然而,如何構(gòu)造更有效的圖形表達(dá),更準(zhǔn)確地評(píng)估聚類可靠性仍然是一個(gè)問題。本文主要對圖形表達(dá)以及聚類可靠性評(píng)估方法進(jìn)行研究,具體內(nèi)容如下:1論文構(gòu)造了一條基于H曲線的簡化DNA序列空間曲線。對于較長的DNA序列圖形表示,該方法不會(huì)出現(xiàn)遠(yuǎn)離中心線的現(xiàn)象,也避免了重疊和交叉的問題,表示方便并且理解直觀,方便于幾何特征分析。2在簡化空間曲線的基礎(chǔ)上,論文利用曲線的幾何特征(曲率和撓率估算)構(gòu)造DNA序列的特征描述。通過序列間的改進(jìn)距離測度方法計(jì)算構(gòu)造距離矩陣,并基于構(gòu)造的距離矩陣進(jìn)行聚類分析和構(gòu)建系統(tǒng)發(fā)生樹以顯示聚類結(jié)果。3 Bootstrap方法直接應(yīng)用在生物學(xué)中有兩個(gè)缺點(diǎn)。其一,它不顧生物漸進(jìn)進(jìn)化的事實(shí),假設(shè)每個(gè)樣本是等可能的;其二,它忽略了一條DNA序列中堿基的相關(guān)性,假設(shè)堿基之間是相互獨(dú)立的。在Bootstrap的基礎(chǔ)上,論文提出了一種評(píng)估DNA序列聚類可靠性的改進(jìn)方法。該方法首先按照一定比例隨機(jī)抽取原始DNA序列的部分堿基,然后對抽取的每個(gè)堿基利用遺傳算法進(jìn)行替換。論文使用改進(jìn)方法對聚類構(gòu)建的進(jìn)化樹進(jìn)行可靠性評(píng)估。實(shí)驗(yàn)結(jié)果發(fā)現(xiàn)可靠性評(píng)估的準(zhǔn)確率得到了提高,表明該方法可行、有效。論文使用提出的圖形表示方法及改進(jìn)測度方法構(gòu)造距離矩陣,用改進(jìn)可靠性評(píng)估方法對基于上述矩陣的聚類結(jié)果進(jìn)行了評(píng)估,同時(shí)也對比了使用其他相關(guān)方法得到的結(jié)果。經(jīng)過實(shí)驗(yàn)分析,本文提出的改進(jìn)方法優(yōu)于相比較的方法。最后論文對研究工作做了總結(jié),并對需要更加深入解決和研究的工作進(jìn)行了展望。
[Abstract]:The graphical representation of biological sequences is characterized by good visual description and local information representation. It has become an important means to study biological sequences. Using graphics to express biological sequences and cluster analysis can effectively study the evolutionary relationship between sequences. However, how to construct more effective graphical expression. It is still a problem to evaluate clustering reliability more accurately. The main contents are as follows: 1. A simplified DNA sequence space curve based on H curve is constructed. For the long DNA sequence graph representation, the method will not appear far from the center line. It also avoids the problems of overlap and crossover, is convenient to express and understand intuitively, and is convenient for geometric feature analysis .2 on the basis of simplifying the spatial curve. In this paper, the geometric characteristics of the curve (curvature and torsion estimation) are used to construct the characteristic description of DNA sequence, and the distance matrix is constructed by the improved distance measure method. Clustering analysis based on the constructed distance matrix and constructing phylogenetic tree to show the clustering result. 3. 3 Bootstrap method has two disadvantages in biology. It disregards the fact that biological evolution is gradual, assuming that each sample is equally possible; Second, it ignores the correlation of bases in a DNA sequence, assuming that the bases are independent of each other. On the basis of Bootstrap. In this paper, an improved method to evaluate the clustering reliability of DNA sequences is proposed. Firstly, some bases of the original DNA sequences are randomly selected according to a certain proportion. Then each base extracted is replaced by genetic algorithm. The improved method is used to evaluate the reliability of the evolutionary tree constructed by clustering. The experimental results show that the accuracy of reliability evaluation is improved. The proposed method is used to construct the distance matrix and the improved reliability evaluation method is used to evaluate the clustering results based on the above mentioned matrix. At the same time, the results obtained by using other related methods are compared. Through experimental analysis, the improved method proposed in this paper is better than the comparison method. Finally, the research work is summarized in this paper. The work that needs to be solved and studied in depth is prospected.
【學(xué)位授予單位】:浙江理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:Q811.4;TP311.13
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 馮濤,康U嗹,
本文編號(hào):1446812
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1446812.html
最近更新
教材專著