天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于混沌游戲表示的蛋白質(zhì)3D圖形表示及其應(yīng)用

發(fā)布時(shí)間:2018-01-21 20:20

  本文關(guān)鍵詞: 混沌游戲表示 蛋白質(zhì)相似性 支持向量機(jī) 抗癌多肽 出處:《山東大學(xué)》2017年碩士論文 論文類型:學(xué)位論文


【摘要】:隨著人類蛋白質(zhì)組計(jì)劃(HPP)的啟動(dòng)和后基因組時(shí)代的來(lái)臨,生物領(lǐng)域產(chǎn)生了海量的蛋白質(zhì)序列數(shù)據(jù)。應(yīng)用分子生物學(xué)手段處理和分析這些序列不僅耗費(fèi)大量時(shí)間和物資,還存在不穩(wěn)定性。根據(jù)"序列決定結(jié)構(gòu)、結(jié)構(gòu)決定功能"這一核心思想,越來(lái)越多的科研人員開始通過數(shù)學(xué)算法和計(jì)算機(jī)技術(shù)處理大量蛋白質(zhì)序列,并從中提取出有意義的結(jié)構(gòu)和功能信息,進(jìn)而指導(dǎo)和支持實(shí)驗(yàn)技術(shù)。生物信息學(xué)對(duì)序列數(shù)據(jù)的處理模型被廣泛應(yīng)用于各個(gè)研究領(lǐng)域,包括藥物研發(fā)、疾病診斷等與人類健康息息相關(guān)的方面。由于蛋白質(zhì)的組成復(fù)雜、功能多樣,蛋白質(zhì)序列的分析難度會(huì)遠(yuǎn)遠(yuǎn)大于DNA和RNA序列�,F(xiàn)有的基于蛋白質(zhì)序列的分析工具,往往存在生物意義不足、可視性差、時(shí)間復(fù)雜度高、準(zhǔn)確度低等各種局限性。鑒于此,本文從生物背景出發(fā),結(jié)合信息學(xué)和統(tǒng)計(jì)學(xué)理論,提出了一種時(shí)間復(fù)雜度低且生物意義明顯的蛋白質(zhì)三維圖形表示。之后,將其應(yīng)用于蛋白質(zhì)序列相似性分析和功能蛋白預(yù)測(cè)兩個(gè)生物信息的重要領(lǐng)域中,以驗(yàn)證該方法的可行性。主要研究工作如下:1.基于混沌游戲表示(CGR,Chaos Game Representation)的特點(diǎn),提出了一種針對(duì)密碼子的逆向CGR圖形表示,并結(jié)合氨基酸的重要理化性質(zhì)將蛋白質(zhì)序列一一對(duì)應(yīng)地映射到三維空間中。逆向CGR模型能將同義密碼子聚集在一起,與生物學(xué)中的擺動(dòng)假說(shuō)一致。之后,基于高效的動(dòng)量向量提取方法,提出一種針對(duì)三維曲線的動(dòng)量向量提取算法,避免了序列長(zhǎng)度不同對(duì)應(yīng)用的影響,極大降低了時(shí)間復(fù)雜度,提高了對(duì)較大數(shù)據(jù)的處理能力。2.將新提出的三維圖形表示應(yīng)用于三個(gè)經(jīng)典蛋白質(zhì)進(jìn)化分析數(shù)據(jù)集上,并與ClustalW以及最近的一些非序列比對(duì)算法比較,結(jié)果顯示逆向CGR圖形表示取得了相似或更好的結(jié)果,與實(shí)際生物進(jìn)化關(guān)系一致。3.為驗(yàn)證圖形表示在其他序列分析中的有效性,本文融合圖形表示提取的向量和氨基酸組分、理化性質(zhì)分類后二聯(lián)體組分等統(tǒng)計(jì)信息,結(jié)合支持向量機(jī)建立預(yù)測(cè)器。針對(duì)抗癌多肽、細(xì)菌黏附素和真核神經(jīng)毒蛋白三種數(shù)據(jù)集進(jìn)行學(xué)習(xí)和預(yù)測(cè),檢驗(yàn)方法為五折交叉驗(yàn)證:在抗癌多肽main和alternative數(shù)據(jù)集中準(zhǔn)確率高達(dá)96%和97.73%,遠(yuǎn)遠(yuǎn)超過參考文獻(xiàn)中的其他方法;在兩個(gè)balanced數(shù)據(jù)集中準(zhǔn)確率達(dá)到88.82%和86.11%,與Tyagi方法的最佳結(jié)果相似,但Tyagi在兩個(gè)數(shù)據(jù)集中表現(xiàn)最好的方法是不同的,也即本文方法能在兩個(gè)數(shù)據(jù)集都能保證很好的結(jié)果,但Tyagi的方法不太穩(wěn)定;在細(xì)菌黏附素和真核神經(jīng)毒蛋白數(shù)據(jù)集中預(yù)測(cè)準(zhǔn)確率分別為92.75%和98.00%,遠(yuǎn)遠(yuǎn)超過參考文獻(xiàn)中的其他方法。實(shí)驗(yàn)證明,本文提出的三維圖形表示方法,不僅具有很強(qiáng)的生物意義和較低的時(shí)間復(fù)雜度,還在序列相似性分析、功能性蛋白二分類預(yù)測(cè)中有出色表現(xiàn),這也驗(yàn)證了該方法的可行性和普適性。
[Abstract]:With the initiation of the human proteome project HPPs and the advent of the post-genome era. The biological field has produced a large amount of protein sequence data. The application of molecular biological means to process and analyze these sequences not only consumes a lot of time and material, but also has instability. As the core idea of "structure determines function", more and more researchers begin to process a large number of protein sequences through mathematical algorithms and computer techniques, and extract meaningful structural and functional information from them. Bioinformatics model of sequence data processing has been widely used in various research fields, including drug development. Disease diagnosis and other aspects are closely related to human health. Because of the complexity of protein composition, the function is diverse. The difficulty of protein sequence analysis will be much greater than that of DNA and RNA sequences. The existing analysis tools based on protein sequences often have insufficient biological significance poor visibility and high time complexity. In view of the limitations of low accuracy, this paper proposes a protein 3D representation with low time complexity and significant biological significance from biological background, combining with information and statistics theory. It is applied to the two important fields of protein sequence similarity analysis and functional protein prediction to verify the feasibility of this method. The main research work is as follows: 1. The CGR is represented based on chaotic game. Chaos Game representation, a reverse CGR graphical representation for codon is proposed. Combined with the important physical and chemical properties of amino acids, the protein sequences were mapped to 3D space. The converse CGR model could gather synonymous codon together, which was consistent with the wobble hypothesis in biology. Based on the efficient momentum vector extraction method, a momentum vector extraction algorithm for 3D curves is proposed, which avoids the influence of different sequence length on the application and greatly reduces the time complexity. The new 3D graphic representation is applied to the three classical protein evolution analysis data sets. 2. Compared with ClustalW and some recent non-sequential alignment algorithms, the results show that the reverse CGR graphical representation has achieved similar or better results. In order to verify the validity of graphic representation in other sequence analysis, this paper fuses graph representation to extract vector and amino acid component. After classification of physical and chemical properties, a predictor was established by combining with support vector machine (SVM). The data sets of anticancer polypeptide, bacterial adhesin and eukaryotic neurotoxin were studied and predicted. The test method is 50% cross validation: the accuracy of main and alternative data sets is 96% and 97.73, which is far higher than the other methods in the reference. The accuracy of the two balanced datasets was 88.82% and 86.11, similar to the best results of the Tyagi method. However, the best methods of Tyagi in two data sets are different, that is, the method in this paper can guarantee good results in both data sets, but the method of Tyagi is not very stable. The accuracy of prediction in the data set of bacterial adhesin and eukaryotic neurotoxin was 92.75% and 98.00, respectively, which was much higher than the other methods in the reference. The proposed method not only has strong biological significance and low time complexity, but also has a good performance in sequence similarity analysis and functional protein two-classification prediction. This also verifies the feasibility and universality of the method.
【學(xué)位授予單位】:山東大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:Q51

【相似文獻(xiàn)】

相關(guān)期刊論文 前1條

1 石龍;黃海蘭;;基于DNA序列混沌游戲表示的相似性分析[J];吉首大學(xué)學(xué)報(bào)(自然科學(xué)版);2009年03期

相關(guān)碩士學(xué)位論文 前3條

1 許春蕊;基于混沌游戲表示的蛋白質(zhì)3D圖形表示及其應(yīng)用[D];山東大學(xué);2017年

2 劉斐;基于基因組混沌游戲表示的親緣分析研究[D];湘潭大學(xué);2013年

3 李博;線粒體完全基因組混沌游戲表示的Markov模型模擬[D];湘潭大學(xué);2009年

,

本文編號(hào):1452473

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/shoufeilunwen/benkebiyelunwen/1452473.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶2e8e8***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com