基于數(shù)字信號(hào)處理理論和方法的外顯子預(yù)測(cè)研究
本文關(guān)鍵詞:基于數(shù)字信號(hào)處理理論和方法的外顯子預(yù)測(cè)研究 出處:《南開(kāi)大學(xué)》2014年博士論文 論文類型:學(xué)位論文
更多相關(guān)文章: 生物信息學(xué) 外顯子預(yù)測(cè) 數(shù)字信號(hào)處理 奇異點(diǎn)檢測(cè) 經(jīng)驗(yàn)?zāi)B(tài)分解
【摘要】:快速、可靠而準(zhǔn)確地預(yù)測(cè)真核生物DNA序列中的外顯子位置,是生物信息學(xué)領(lǐng)域的一個(gè)重要問(wèn)題。其中,準(zhǔn)確預(yù)測(cè)短長(zhǎng)度外顯子的位置是準(zhǔn)確預(yù)測(cè)外顯子位置和數(shù)目的難點(diǎn)之一。在外顯子預(yù)測(cè)中,有效地抑制由內(nèi)含子區(qū)所產(chǎn)生的背景噪聲對(duì)提高短長(zhǎng)度外顯子的預(yù)測(cè)準(zhǔn)確度具有重要的作用。在真核生物的基因中,少量負(fù)責(zé)編碼蛋白質(zhì)的外顯子被非編碼的內(nèi)含子間隔成許多不連續(xù)的片斷,而且較大比例的外顯子長(zhǎng)度都很小。準(zhǔn)確地預(yù)測(cè)短長(zhǎng)度外顯子受制于其所缺乏的明顯特征,這使得預(yù)測(cè)工作十分困難。而且一些短長(zhǎng)度外顯子所包含的編碼信息在腫瘤侵襲和轉(zhuǎn)移等各個(gè)環(huán)節(jié)發(fā)揮重要作用。論文提出了兩種外顯子預(yù)測(cè)方法,從捕捉短長(zhǎng)度外顯子的特征和抑制內(nèi)含子區(qū)背景噪聲這兩方面對(duì)短長(zhǎng)度外顯子的預(yù)測(cè)準(zhǔn)確度進(jìn)行了提升。 目前針對(duì)外顯子的預(yù)測(cè)方法,根據(jù)原理和特點(diǎn)的不同,主要可以歸納為基于數(shù)字信號(hào)處理和基于數(shù)據(jù)庫(kù)的兩大類外顯子預(yù)測(cè)方法。論文利用小波變換模極大值的奇異點(diǎn)檢測(cè)算法和經(jīng)驗(yàn)?zāi)B(tài)分解,發(fā)展了兩種外顯子預(yù)測(cè)方法。論文整體工作概述如下: (1)基于小波變換模極大值奇異點(diǎn)檢測(cè)算法的外顯子預(yù)測(cè)方法。該方法首先構(gòu)造了核苷酸分布序列,通過(guò)跟蹤核苷酸分布序列的小波變換模極大值點(diǎn)沿尺度的傳播特性,對(duì)外顯子信號(hào)與內(nèi)含子所產(chǎn)生的噪聲進(jìn)行有效地分離,并對(duì)短長(zhǎng)度外顯子所產(chǎn)生的信號(hào)突變點(diǎn)保持較高的重構(gòu)精度,進(jìn)而實(shí)現(xiàn)了對(duì)短長(zhǎng)度外顯子的準(zhǔn)確探測(cè)。數(shù)據(jù)集HMR195和BG570是兩個(gè)用于評(píng)估外顯子預(yù)測(cè)方法性能優(yōu)劣的通用數(shù)據(jù)集,論文利用這兩個(gè)數(shù)據(jù)集對(duì)奇異點(diǎn)檢測(cè)方法在預(yù)測(cè)短長(zhǎng)度外顯子中的性能以及整體預(yù)測(cè)性能進(jìn)行評(píng)估。與現(xiàn)有主要的預(yù)測(cè)方法相比,奇異點(diǎn)檢測(cè)方法對(duì)數(shù)據(jù)集HMR195和BG570中外顯子預(yù)測(cè)所得到的預(yù)測(cè)結(jié)果主要體現(xiàn)在如下三個(gè)方面:1)在對(duì)長(zhǎng)度小于等于50堿基對(duì)以及長(zhǎng)度小于等于200堿基對(duì)的短長(zhǎng)度外顯子預(yù)測(cè)中,該方法對(duì)短長(zhǎng)度外顯子所得到的探測(cè)率分別至少有12%和8%的提高;2)在對(duì)整體外顯子的預(yù)測(cè)中,該方法對(duì)外顯子預(yù)測(cè)的所得到的準(zhǔn)確率至少有6.8%的提高;3)在抑制內(nèi)含子區(qū)背景噪聲方面,該方法所得到的信噪比至少有74.5%的提高。 (2)為擴(kuò)大奇異點(diǎn)檢測(cè)方法的應(yīng)用范圍,論文從數(shù)據(jù)庫(kù)NCBI GenBank中隨機(jī)選取了200組測(cè)試數(shù)據(jù),其中每組測(cè)試數(shù)據(jù)包含一個(gè)短長(zhǎng)度內(nèi)含子以及被這個(gè)短長(zhǎng)度內(nèi)含子所分隔的兩個(gè)相鄰短長(zhǎng)度外顯子。在對(duì)這200組測(cè)試數(shù)據(jù)中外顯子的預(yù)測(cè)中,與現(xiàn)有主要的預(yù)測(cè)方法相比,奇異點(diǎn)檢測(cè)方法所得到的預(yù)測(cè)準(zhǔn)確率至少有20.7%的提高。 (3)基于經(jīng)驗(yàn)?zāi)B(tài)分解和修改Gabor小波變換的外顯子預(yù)測(cè)方法。該方法采用了基于DNA抗彎剛度的數(shù)值映射機(jī)制,利用經(jīng)驗(yàn)?zāi)B(tài)分解將DNA數(shù)值序列分解為若干本征模態(tài)函數(shù)。然后,通過(guò)修改Gabor小波變換計(jì)算第一個(gè)本征模態(tài)分量的局部功率譜。鑒于經(jīng)驗(yàn)?zāi)B(tài)分解是一種自適應(yīng)的非平穩(wěn)信號(hào)處理工具,因此該方法可以對(duì)傳統(tǒng)方法無(wú)法觀察到的短長(zhǎng)度外顯子特征進(jìn)行探測(cè)。此外,由于只計(jì)算了第一個(gè)本征模態(tài)分量的局部功率譜,因此該方法在噪聲抑制方面具有一定的優(yōu)勢(shì)。與現(xiàn)有主要的預(yù)測(cè)方法相比,該方法在對(duì)數(shù)據(jù)集HMR195中外顯子的預(yù)測(cè)中,其預(yù)測(cè)結(jié)果主要體現(xiàn)在如下兩個(gè)方面:1)該方法對(duì)外顯子預(yù)測(cè)所得到的信噪比至少有20.8%的提高;2)在對(duì)長(zhǎng)度小于等于50堿基對(duì)的短長(zhǎng)度外顯子預(yù)測(cè)中,該方法所得到的探測(cè)率至少有5.3%的提高。圖60幅,表14個(gè),參考文獻(xiàn)120篇。
[Abstract]:Fast, reliable and accurate prediction of eukaryotic DNA sequences in the exon position, is an important problem in the field of bioinformatics. The accurate prediction of short length exon position is one of the difficulties in accurately predicting the exon number and position. In exon prediction, prediction. To reduce the background noise generated by the intron of short length exon accuracy plays an important role in eukaryotic gene, a protein responsible for encoding exons are non encoding intron interval into many discontinuous segments, and a large proportion of the exon length are very small. To accurately predict the short length of exons due to their lack of obvious characteristics, which makes the prediction very difficult. But some of the short length of exons encoding information contained in various aspects of tumor invasion and metastasis play an important role In this paper, two exons prediction methods are proposed, which improve the prediction accuracy of short exons from two aspects: capturing the characteristics of short exons and suppressing the background noise of introns.
The needle exon prediction method, according to the principle and characteristics of the different, can be summarized as the main based on digital signal processing and exon two categories based on database prediction method. Modal singularity using wavelet transform modulus maxima detection algorithm and empirical decomposition, developed two exon prediction method. The whole thesis work is summarized as follows:
(1) based on wavelet transform modulus maxima and singular point detection algorithm of the exon prediction method. This method firstly constructed nucleotide distribution sequence, the wavelet transform modulus maxima distribution of nucleotide sequence tracking propagation point along the scale, to effectively separate the noise signal of exon and intron generated, and the short length of explicit signal subspace generated by point mutations remain reconstruction with high accuracy, and realizes accurate detection of short length of exons HMR195 and BG570. The data set is used to evaluate the two exon prediction method of the performance of the general data set, this paper use two data sets to evaluate singular point detection method of exon performance and overall prediction performance in the prediction of short length. Compared with the existing prediction methods mainly, singular point detection method on data sets HMR195 and BG570 exon predicted The prediction results are mainly embodied in three aspects as follows: 1) in length is less than or equal to 50 BP exon and the prediction of the length is less than or equal to 200 base pairs in length of short, this method respectively at least 12% and 8% increase of short length explicit detector obtained; 2) in prediction the whole exon, the prediction accuracy of the method of exons obtained by at least 6.8% of the increase; 3) in the inhibition of intron of background noise, the method obtained the signal-to-noise ratio of at least 74.5% of the increase.
(2) to expand the scope of application of the singular point detection method, the paper from the database NCBI GenBank randomly selected 200 groups of test data, in which each test data contains a short length of introns and are separated by the short length of intron two adjacent short length exons. In the prediction of these 200 groups of tests the data of exons, compared with the existing prediction methods mainly, prediction of singularity detection methods get accurate rate of at least 20.7% of the increase.
(3) exon prediction method of empirical mode decomposition and wavelet transform based on the modified Gabor. The method uses numerical mapping mechanism of DNA bending stiffness based on the decomposition of DNA numerical sequence is decomposed into several intrinsic mode functions using empirical mode. Then, by modifying the Gabor wavelet transform to calculate the first intrinsic mode component the local power spectrum. In view of the EMD is non-stationary signal processing tools, an adaptive short length, so the method can the traditional method cannot be observed in exon feature detection. In addition, due to the intrinsic mode component of a local power spectrum calculated only, so the method in noise suppression have a certain advantage. Compared with the existing prediction methods mainly, this method in the prediction of the HMR195 data sets of exons, the predicted results are mainly embodied in two aspects as follows: 1) the party The signal to noise ratio (SNR) predicted by external prediction is increased by at least 20.8%. 2) in the prediction of short exons with short length and 50 base pairs, the detection rate obtained by this method is at least 5.3%. 60 graphs, 14 tables and 120 references.
【學(xué)位授予單位】:南開(kāi)大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2014
【分類號(hào)】:Q811.4;TN911.7
【參考文獻(xiàn)】
相關(guān)期刊論文 前6條
1 馬寶山;朱義勝;;一種用于基因預(yù)測(cè)的FIR數(shù)字濾波器[J];電子學(xué)報(bào);2007年09期
2 ;A Brief Review of Computational Gene Prediction Methods[J];Genomics Proteomics & Bioinformatics;2004年04期
3 Sitanshu Sekhar Sahu;Ganapati Panda;;Identification of Protein-Coding Regions in DNA Sequences Using A Time-Frequency Filtering Approach[J];Genomics, Proteomics & Bioinformatics;2011年Z1期
4 杜竹青;;一種提高外顯子預(yù)測(cè)的改進(jìn)周期3消噪策略[J];江蘇科技大學(xué)學(xué)報(bào)(自然科學(xué)版);2013年06期
5 馬玉韜;車進(jìn);關(guān)欣;滕建輔;;加窗窄通帶濾波器蛋白質(zhì)編碼區(qū)預(yù)測(cè)算法[J];數(shù)據(jù)采集與處理;2013年02期
6 馬玉韜;軒秀巍;車進(jìn);滕建輔;;基于全相位濾波理論的基因預(yù)測(cè)[J];上海交通大學(xué)學(xué)報(bào);2013年07期
,本文編號(hào):1405423
本文鏈接:http://sikaile.net/kejilunwen/wltx/1405423.html