基于序列統(tǒng)計特征的基因識別算法研究

發(fā)布時間：2018-05-18 07:22

本文選題：基因識別 + 多特征融合��；參考：《哈爾濱工業(yè)大學(xué)》2017年碩士論文

【摘要】：面對世間紛繁浩瀚的模式生物的全基因組數(shù)據(jù),能夠高效、精準的識別其中可編碼蛋白的基因序列具有非常巨大的實用意義。這種意義致使基因識別作為生物信息學(xué)研究和發(fā)展的基礎(chǔ),向來備受學(xué)者們的青睞。傳統(tǒng)的研究方式主要依托于繁瑣的生物實驗,過程緩慢且耗時耗力。本文則主要依托信號處理的理論和方法,如傅里葉變換、濾波器算法、智能計算、統(tǒng)計學(xué)習(xí)等,從序列統(tǒng)計特征的角度對該問題加以深入研討。而周期3性質(zhì)作為一項重要的統(tǒng)計特征一直被廣泛地應(yīng)用于基因識別中。為了獲得更好的識別性能,研究者們在基因序列的信號濾波處理以及周期3特征強化方面做出了很大的研究貢獻,但仍然存在很大的不足。本文針對固定步長LMS自適應(yīng)濾波器算法在基因預(yù)測中存在的問題,結(jié)合系統(tǒng)的反饋輸出和基因序列堿基組成成份的特征信息,提出一種新的具有更好濾波效果和強化周期3特征功能的變步長LMS自適應(yīng)濾波器改進算法,通過仿真實驗分析驗證算法性能。研究表明,與現(xiàn)有算法相比,所提算法精度優(yōu)越性較為明顯。另外,針對短基因序列存在的特征信息較弱,不利于基因識別的問題,本文也提出一種新的依據(jù)各單特征表征能力而加權(quán)融合多特征的改進算法,著重分析其在序列長度低于200 bp的短基因數(shù)據(jù)集中的識別性能,與傳統(tǒng)多特征融合算法相比,所提算法是有效的、魯棒的。結(jié)合上述兩方面的研究,本文實現(xiàn)一個結(jié)合了數(shù)字信號處理技術(shù)和多特征融合優(yōu)勢的人類基因組專用的基因識別系統(tǒng)。該系統(tǒng)因擺脫了對條件隨機場、隱馬爾科夫模型和支持向量機等傳統(tǒng)機器學(xué)習(xí)方法的依賴,具有實現(xiàn)簡單、無需訓(xùn)練保存大量模型參數(shù)、不過多受已有訓(xùn)練數(shù)據(jù)集知識結(jié)構(gòu)影響以及可實時識別等特點。并通過基準測試數(shù)據(jù)集ALLSEQ和HMR195綜合驗證系統(tǒng)性能。
[Abstract]:It is of great practical significance to recognize the gene sequence of the encoded protein efficiently and accurately in the face of the vast genome data of the model organism in the world. As the basis of bioinformatics research and development, gene recognition has always been favored by scholars. The traditional research methods mainly rely on tedious biological experiments, the process is slow and time-consuming. This paper mainly relies on the theory and methods of signal processing, such as Fourier transform, filter algorithm, intelligent computing, statistical learning, etc. Cycle 3, as an important statistical feature, has been widely used in gene recognition. In order to obtain better recognition performance, researchers have made great contributions to the signal filtering of gene sequences and the enhancement of cycle 3 features, but there are still many shortcomings. In order to solve the problem of fixed-step LMS adaptive filter algorithm in gene prediction, this paper combines the feedback output of the system and the characteristic information of the base composition of gene sequence. A new variable step size LMS adaptive filter with better filtering effect and enhanced cycle 3 features is proposed. The performance of the algorithm is verified by simulation analysis. The results show that compared with the existing algorithms, the accuracy of the proposed algorithm is obvious. In addition, in view of the weak feature information of short gene sequences, which is not conducive to gene recognition, this paper also proposes a new weighted fusion algorithm for multiple features according to the ability of each single feature representation. The performance of the proposed algorithm in the short gene dataset with a sequence length of less than 200 BP is analyzed. Compared with the traditional multi-feature fusion algorithm, the proposed algorithm is effective and robust. Combined with the above two aspects, this paper implements a special gene recognition system for human genome, which combines the advantages of digital signal processing and multi-feature fusion. The system is free from the dependence of traditional machine learning methods such as conditional random field, hidden Markov model and support vector machine, so it is easy to implement and saves a large number of model parameters without training. It is not too much influenced by the knowledge structure of existing training data sets and can be recognized in real time. The system performance is verified by benchmark data set ALLSEQ and HMR195.
【學(xué)位授予單位】：哈爾濱工業(yè)大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2017
【分類號】：Q811.4

【參考文獻】

相關(guān)期刊論文前7條

1 Haohuan FU;Junfeng LIAO;Jinzhe YANG;Lanning WANG;Zhenya SONG;Xiaomeng HUANG;Chao YANG;Wei XUE;Fangfang LIU;Fangli QIAO;Wei ZHAO;Xunqiang YIN;Chaofeng HOU;Chenglong ZHANG;Wei GE;Jian ZHANG;Yangang WANG;Chunbo ZHOU;Guangwen YANG;;The Sunway Taihu Light supercomputer:system and applications[J];Science China(Information Sciences);2016年07期

2 馬玉韜;軒秀巍;車進;滕建輔;;基于全相位濾波理論的基因預(yù)測[J];上海交通大學(xué)學(xué)報;2013年07期

3 羅亮;史曉紅;許進;;LVQ神經(jīng)網(wǎng)絡(luò)方法預(yù)測蛋白質(zhì)結(jié)構(gòu)中的二硫鍵[J];系統(tǒng)仿真學(xué)報;2007年09期

4 王明怡,吳平,王德林;基于相關(guān)性分析的基因選擇算法[J];浙江大學(xué)學(xué)報(工學(xué)版);2004年10期

5 陳曉燕,鮑倫軍,莫金垣;連續(xù)小波變換法分析核酸序列的長程相關(guān)性[J];中山大學(xué)學(xué)報(自然科學(xué)版);2003年03期

6 夏慧煜,周晴,李衍達;隱Markov模型在剪接位點識別中的應(yīng)用[J];清華大學(xué)學(xué)報(自然科學(xué)版);2002年09期

7 楊文強,錢敏平,HUANG Da-Wei;基于隱馬氏模型對編碼序列缺失與插入的檢測(英文)[J];生物化學(xué)與生物物理進展;2002年01期

相關(guān)博士學(xué)位論文前1條

1 馬寶山;基于信號處理理論和方法的基因預(yù)測研究[D];大連海事大學(xué);2008年

，

本文編號：1904940

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/jiyingongcheng/1904940.html

上一篇：Kigamicin生物合成基因簇中orf48和orf49的功能研究
下一篇：玉米隱花色素基因CRY1a的克隆及功能分析

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于序列統(tǒng)計特征的基因識別算法研究