天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 自動化論文 >

基于平衡分類算法的蛋白質(zhì)二級結(jié)構(gòu)預(yù)測

發(fā)布時間:2018-07-15 09:00
【摘要】:蛋白質(zhì)在生命過程中起著非常關(guān)鍵的作用,是生命活動的物質(zhì)承擔(dān)者。而蛋白質(zhì)的結(jié)構(gòu)決定了其功能,因此通過蛋白質(zhì)結(jié)構(gòu)預(yù)測其在生命過程中的功能非常重要。蛋白質(zhì)結(jié)構(gòu)分為四個層次:一級結(jié)構(gòu)是指蛋白質(zhì)序列的氨基酸殘基排列;二級結(jié)構(gòu)是指蛋白質(zhì)多肽鏈上的局部空間構(gòu)象(螺旋helix、片層Stand和卷曲coil);三級結(jié)構(gòu)蛋白質(zhì)多肽鏈上所有原子的空間位置;而擁有多條多肽鏈的蛋白質(zhì)還具有四級結(jié)構(gòu),也就是多條多肽鏈的相對位置。與蛋白質(zhì)功能直接相關(guān)的是蛋白質(zhì)的三級結(jié)構(gòu),然而本文很難直接獲取蛋白質(zhì)的三級結(jié)構(gòu)信息,傳統(tǒng)的物理化學(xué)檢測方法耗時耗力,很難勝任,直接從蛋白質(zhì)一級序列信息預(yù)測蛋白質(zhì)三級結(jié)構(gòu)又及其困難,因此蛋白質(zhì)二級結(jié)構(gòu)預(yù)測作為一級結(jié)構(gòu)與三級結(jié)構(gòu)的橋梁存在廣泛的前景。不過由于蛋白質(zhì)二級結(jié)構(gòu)中片層結(jié)構(gòu)含量普遍較低,加上傳統(tǒng)機器學(xué)習(xí)分類器無法采集蛋白質(zhì)一級結(jié)構(gòu)中位點遠端的相互作用,使得片層結(jié)構(gòu)預(yù)測率不足,直接影響蛋白質(zhì)二級結(jié)構(gòu)預(yù)測的效果。本文試圖改進已有的PSIPRED算法(一種基于人工神經(jīng)網(wǎng)絡(luò)的分類算法,以序列的位置特異性得分矩陣為樣本輸入),引入平衡的分類機制,使得算法預(yù)測更為平衡、有效,最后應(yīng)用于蛋白質(zhì)三級結(jié)構(gòu)中蛋白質(zhì)結(jié)構(gòu)類的預(yù)測。本文做出的改進嘗試及其創(chuàng)新點如下:1.嘗試四種改進的策略,分別是:改變神經(jīng)網(wǎng)絡(luò)的輸入編碼,引入更多與遠端相互作用相關(guān)的序列信息,例如殘基分子量大小、等電點、親水性等;采取平衡的抽樣策略,在訓(xùn)練過程中對含量較低的結(jié)構(gòu)重復(fù)抽樣;在訓(xùn)練過程中采用加權(quán)的代價函數(shù);對神經(jīng)網(wǎng)絡(luò)的輸出進行加權(quán)評估以平衡分類器的輸出。最終發(fā)現(xiàn),采用對神經(jīng)網(wǎng)絡(luò)輸結(jié)果進行加權(quán)的策略最為有效,本文在改進的CB513數(shù)據(jù)集上采用8折交叉驗證得到的總體準確率為74.28%,相應(yīng)的beta-sheet準確率為63.73,比原始方法高出2.34個百分點。2.以已經(jīng)預(yù)測的蛋白質(zhì)二級結(jié)構(gòu)的混沌游戲表示chaos games representation(CGR)作為蛋白質(zhì)結(jié)構(gòu)類預(yù)測(structural classes prediction)的輸入特征交于神經(jīng)網(wǎng)絡(luò)進行蛋白質(zhì)結(jié)構(gòu)類的預(yù)測。最終在Astral40數(shù)據(jù)集上獲得了71%的準確率,比直接用一級序列信息的CGR方法高出許多。本文采用的方法能夠較為有效地預(yù)測蛋白質(zhì)的結(jié)構(gòu)類。
[Abstract]:Protein plays a key role in the life process and is the material carrier of life activities. The structure of protein determines its function, so it is very important to predict its function in life process by protein structure. The protein structure is divided into four levels: the primary structure refers to the amino acid residues arrangement of the protein sequence; The secondary structure refers to the local spatial conformation of the protein polypeptide chain (helix, lamellar stand and coiled coil); tertiary structure), the spatial position of all atoms in the protein polypeptide chain, while the protein with multiple polypeptide chains also has a quaternary structure. This is the relative position of multiple polypeptide chains. The tertiary structure of protein is directly related to the function of protein. However, it is difficult to obtain the information of tertiary structure of protein directly in this paper. It is very difficult to predict the tertiary structure of protein directly from the information of protein primary sequence, so the prediction of protein secondary structure as a bridge between primary structure and tertiary structure has a broad prospect. However, because of the low content of lamellar structure in protein secondary structure and the inability of traditional machine learning classifier to collect the interaction between the distal sites of protein primary structure, the prediction rate of lamellar structure is insufficient. The prediction effect of protein secondary structure is directly affected. This paper attempts to improve the existing PSIPRED algorithm (a classification algorithm based on artificial neural network, which takes the position specificity score matrix of the sequence as the sample input), and introduces a balanced classification mechanism to make the prediction more balanced and effective. Finally, it is applied to the prediction of protein structure class in protein tertiary structure. The improvements and innovations made in this paper are as follows: 1. Four improved strategies are tried: changing the input coding of neural networks, introducing more sequence information related to remote interactions, such as molecular weight of residues, isoelectric point, hydrophilicity, etc., and adopting a balanced sampling strategy. In the process of training, the structure with low content is sampled repeatedly, the weighted cost function is used in the training process, and the output of neural network is evaluated weighted to balance the output of the classifier. Finally, it is found that the strategy of weighting the results of neural network is the most effective. The overall accuracy of the improved CB513 data set by 20% discount cross validation is 74.28% and the corresponding beta-sheet accuracy is 63.73, which is 2.34% higher than the original method. The predicted chaotic game of protein secondary structure (chaos games representation () is used as the input feature of protein structure class to predict (structural classes prediction). The neural network is used to predict the protein structure class. Finally, 71% accuracy is obtained on the Astral40 dataset, which is much higher than the first order sequence information method. The method used in this paper can effectively predict the structural classes of proteins.
【學(xué)位授予單位】:河南科技大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:Q51;TP183

【參考文獻】

相關(guān)期刊論文 前1條

1 隋海峰;曲武;錢文彬;楊炳儒;;基于混合SVM方法的蛋白質(zhì)二級結(jié)構(gòu)預(yù)測算法[J];計算機科學(xué);2011年10期

相關(guān)碩士學(xué)位論文 前4條

1 張安勝;深度學(xué)習(xí)在蛋白質(zhì)二級結(jié)構(gòu)預(yù)測中的應(yīng)用研究[D];安徽大學(xué);2015年

2 林錦華;基于隱馬爾可夫模型的蛋白質(zhì)二級結(jié)構(gòu)預(yù)測[D];福建農(nóng)林大學(xué);2012年

3 孫文恒;基于遺傳算法和BP神經(jīng)網(wǎng)絡(luò)的蛋白質(zhì)二級結(jié)構(gòu)預(yù)測研究[D];蘭州大學(xué);2008年

4 于莉;基于PBIL算法的蛋白質(zhì)二級結(jié)構(gòu)預(yù)測方法研究[D];國防科學(xué)技術(shù)大學(xué);2006年

,

本文編號:2123536

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/2123536.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶a8780***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com