基于時間序列理論方法的流感病毒DNA序列特征分析
發(fā)布時間:2018-05-11 21:41
本文選題:流感病毒 + DNA序列 ; 參考:《江南大學》2011年碩士論文
【摘要】:流感是一種反復出現(xiàn)的傳染病,在全球引起了高發(fā)病率和高死亡率.流感病毒分為三類:甲型(A型),乙型(B型),丙型(C型).在這三種類型中甲型流感病毒是最致命的流感病毒,給人類帶來了嚴重的疾病.2009年流感病毒大流行再次爆發(fā),以及20世紀人類經(jīng)歷了好幾次流感病毒的爆發(fā),都表明我們對流感病毒的認識還不全面,它們的很多特性還有待于我們進一步挖掘.流感病毒給人類健康帶來很大威脅,因此對流感病毒的DNA序列和蛋白質(zhì)序列的進一步研究是一項迫在眉睫的工作,它們的特征分析對流感病毒的預防、新疫苗的研制、藥物分子設計、控制及治療都具有重要意義. 在介紹了生物信息學的研究背景后,本文介紹了研究生物序列特性的主要方法即時間序列理論方法.該方法主要是通過處理動態(tài)數(shù)據(jù),進行分析、預測和控制.對本文要用到的ARIMA(p,d,q)模型和ARFIMA(p,d,q)模型的定義、性質(zhì)及方法作了闡述,為研究流感病毒DNA序列和蛋白質(zhì)序列特性作了理論上的準備工作.基于CGR坐標將流感病毒DNA序列轉(zhuǎn)換成CGR弧度序列,并引入長記憶模型ARFIMA模型來分析.發(fā)現(xiàn)從甲型流感病毒DNA序列中隨機找來的10條H1N1序列和10條H3N2序列都具有長相關(guān)性且擬合很好,并且還發(fā)現(xiàn)這兩種序列可以嘗試用不同的ARFIMA模型去識別,其中H1N1可用ARFIMA(0,d,5)模型去識別, H3N2可用ARFIMA(1,d,1)模型去識別.接著,對乙型、丙型流感病毒DNA序列進行了分析研究,發(fā)現(xiàn)隨機找來的10條乙型序列和10條丙型序列同樣具有長相關(guān)性且擬合很好,還發(fā)現(xiàn)這兩種序列也可嘗試用不同的ARFIMA模型去識別.作為一個具有完善算法的經(jīng)典時間序列模型,ARFIMA模型能幫助我們挖掘流感病毒DNA序列中未知的特性. 采用ARIMA模型預測甲型流感病毒中H1N1亞型DNA序列堿基,這對H1N1病毒研究有著重要的意義.我們選取1970年-2010年同源性相對較高的41條HINI流感病毒數(shù)據(jù),利用ARIMA(p,d,q)模型對前20個位置去擬合并且預測,除極個別外由預報區(qū)域顯示原始數(shù)據(jù)都在預報區(qū)域內(nèi),表明模型建立合理,預報效果很好.基于此,用同樣的方法對甲型流感病毒H1N1亞型血凝素氨基酸序列進行了研究分析,同樣發(fā)現(xiàn)預報效果很好.
[Abstract]:Influenza is a recurrent infectious disease that causes high morbidity and mortality worldwide. Influenza viruses are classified into three types: type A, B and C. Of these three types of influenza viruses, influenza A virus is the deadliest type of influenza virus, causing serious diseases to human beings. The influenza virus pandemic broke out again in 2009, and humans experienced several outbreaks of influenza virus in the 20th century. All show that we are not fully aware of influenza viruses, and many of their characteristics need to be further explored. Influenza viruses pose a great threat to human health, so it is an urgent task to further study the DNA sequence and protein sequence of influenza virus. Their characteristic analysis is the prevention of influenza virus and the development of new vaccine. Drug molecular design, control and treatment are of great significance. After introducing the research background of bioinformatics, this paper introduces the main method of studying the characteristics of biological sequence, that is, the method of time series theory. This method is mainly through dynamic data processing, analysis, prediction and control. In this paper, the definition, properties and methods of the Arima model and the ARFIMA PU DX) model used in this paper are described, and the theoretical preparations for the study of the DNA sequence and protein sequence characteristics of influenza virus are made. The DNA sequence of influenza virus is transformed into CGR Radian sequence based on CGR coordinate, and the long memory model ARFIMA model is introduced to analyze it. It was found that 10 H1N1 sequences and 10 H3N2 sequences from DNA sequences of influenza A virus had long correlation and good fitting, and that the two sequences could be identified with different ARFIMA models. The H1N1 can be identified by the ARFIMA0 / DU (5) model, and the H3N2 by the ARFIMA (1 / 1) model. Then, the DNA sequences of influenza B and C viruses were analyzed and studied. It was found that the 10 Japanese and 10 type C sequences were also highly correlated and fitted well. It is also found that the two sequences can also be identified with different ARFIMA models. As a classical time series model with perfect algorithm, the ARFIMA model can help us to mine unknown characteristics of influenza virus DNA sequences. ARIMA model is used to predict the DNA sequence of H1N1 subtype in influenza A virus, which is of great significance to the study of H1N1 virus. We selected 41 HINI influenza virus data with relatively high homology from 1970 to 2010, and used Arima model to fit and predict the first 20 locations. Except for a few, the original data were all in the forecast area. It shows that the model is reasonable and the forecast effect is very good. Based on this, the amino acid sequence of H1N1 subtype hemagglutinin of influenza A virus was studied by the same method.
【學位授予單位】:江南大學
【學位級別】:碩士
【學位授予年份】:2011
【分類號】:R346
【引證文獻】
相關(guān)期刊論文 前1條
1 張玲;高潔;;甲型H1N1流感病毒蛋白質(zhì)序列的預測[J];生物技術(shù);2012年06期
,本文編號:1875751
本文鏈接:http://sikaile.net/xiyixuelunwen/1875751.html
最近更新
教材專著