蛋白質(zhì)結(jié)構(gòu)功能預(yù)測中若干關(guān)鍵算法的分析比較

發(fā)布時間：2019-02-21 15:44

【摘要】：隨著測序技術(shù)的快速發(fā)展,蛋白質(zhì)序列數(shù)量與已測定結(jié)構(gòu)與功能的蛋白質(zhì)數(shù)量的差距越來越大,迫切需要通過理論計算方法實現(xiàn)蛋白質(zhì)結(jié)構(gòu)功能的預(yù)測。目前,許多有效的方法被提出來研究蛋白質(zhì)序列、結(jié)構(gòu)和功能之間的關(guān)系,但不同方法在解決蛋白質(zhì)結(jié)構(gòu)功能研究中具有偏好性。因此,本文主要圍繞蛋白質(zhì)結(jié)構(gòu)功能研究中方法展開,系統(tǒng)地比較分析了不同的特征提取方法、特征挑選方法和預(yù)測算法在蛋白質(zhì)結(jié)構(gòu)類、蛋白質(zhì)紊亂、蛋白質(zhì)分子伴侶、蛋白質(zhì)溶解度和RNA結(jié)合蛋白質(zhì)的預(yù)測中效率。主要研究內(nèi)容如下:1、簡要介紹了蛋白質(zhì)研究的研究背景及意義、蛋白質(zhì)的組成、結(jié)構(gòu)和物理化學性質(zhì),并簡述了常用的數(shù)據(jù)庫及本文采用的數(shù)據(jù)集,為本文的研究提供了理論和數(shù)據(jù)基礎(chǔ)。2、分析比較了蛋白質(zhì)結(jié)構(gòu)功能預(yù)測中氨基酸約化和特征提取方法。根據(jù)522種氨基酸性質(zhì)將20種氨基酸約化成k類,提取蛋白質(zhì)6類不同信息,結(jié)合支持向量機比較分析了氨基酸約化與信息提取方法在蛋白質(zhì)結(jié)構(gòu)功能預(yù)測中的效率。結(jié)果表明,在蛋白質(zhì)結(jié)構(gòu)類和蛋白質(zhì)分子伴侶預(yù)測中,最好采用氨基酸的轉(zhuǎn)向傾向類性質(zhì)約化20種氨基酸,再提取蛋白質(zhì)的順序特征,而蛋白質(zhì)溶解度的預(yù)測則偏向于蛋白質(zhì)的RCTD特征提取方法。3、分析比較了蛋白質(zhì)結(jié)構(gòu)功能預(yù)測中特征挑選方法。本章選取了基于互信息的特征挑選方法、基于支持向量機的特征挑選方法等16種,結(jié)合K近鄰預(yù)測算法比較分析了特征挑選方法在蛋白質(zhì)結(jié)構(gòu)功能預(yù)測中的效率。結(jié)果表明,基于非線性支持向量機的特征挑選方法在蛋白質(zhì)結(jié)構(gòu)類預(yù)測、蛋白質(zhì)溶解度預(yù)測、蛋白質(zhì)分子伴侶預(yù)測和蛋白質(zhì)溶解度預(yù)測中表現(xiàn)最好,經(jīng)過挑選后特征的準確率提升了13.16%-71%,尤其是蛋白質(zhì)的k-mer特征和PSSM特征。4、分析比較了蛋白質(zhì)結(jié)構(gòu)功能預(yù)測中預(yù)測算法。本章選取了線性判別分析算法、主成分分析判別算法等7種預(yù)測算法,并比較分析了不同預(yù)測算法在蛋白質(zhì)結(jié)構(gòu)功能中效率。結(jié)果表明,在蛋白質(zhì)結(jié)構(gòu)類預(yù)測中,SVM預(yù)測算法表現(xiàn)最好,尤其與蛋白質(zhì)PRseAAC特征結(jié)合,預(yù)測準確率達到99.15%;選擇PCADA、CART、PLSDA、KNN或者SVM算法可以較準確地預(yù)測蛋白質(zhì)的分子伴侶;在蛋白質(zhì)紊亂預(yù)測中,KNN預(yù)測算法與蛋白質(zhì)RCTD特征結(jié)合表現(xiàn)最好,準確率達到了94.75%;蛋白質(zhì)溶解度預(yù)測應(yīng)選取PSSM特征,結(jié)合PLSDA和PCADA預(yù)測算法;而在預(yù)測RNA結(jié)合的蛋白質(zhì)時,采用GO特征和CART算法的組合或者GO特征和PLSDA算法的組合,都能獲得較好的預(yù)測準確率。
[Abstract]:With the rapid development of sequencing technology, the gap between the number of protein sequences and the number of proteins with measured structure and function is increasing. Therefore, it is urgent to realize the prediction of protein structure and function by theoretical calculation. At present, many effective methods have been proposed to study the relationship among protein sequence, structure and function, but different methods have a preference in solving the problem of protein structure and function. Therefore, this paper mainly focuses on the research methods of protein structure and function, and systematically compares and analyzes different feature extraction methods, feature selection methods and prediction algorithms in protein structure class, protein disorder, protein molecular chaperone. Protein solubility and RNA binding protein prediction efficiency. The main research contents are as follows: 1. The background and significance of protein research, the composition, structure and physicochemical properties of protein are briefly introduced, and the commonly used databases and the data sets used in this paper are briefly introduced. 2. The methods of amino acid reduction and feature extraction in the prediction of protein structure and function were analyzed and compared. According to the properties of 522 amino acids, 20 kinds of amino acids were reduced to k class, and 6 kinds of different information of protein were extracted. The efficiency of amino acid reduction and information extraction methods in predicting protein structure and function was compared and analyzed with support vector machine (SVM). The results showed that in the prediction of protein structure and protein molecular chaperone, it was better to use the conversion tendency of amino acids to reduce 20 kinds of amino acids, and then extract the sequence characteristics of proteins. However, the prediction of protein solubility tends to the RCTD feature extraction method of protein. 3. The feature selection method in protein structure and function prediction is analyzed and compared. In this chapter, 16 feature selection methods based on mutual information and support vector machine are selected, and the efficiency of feature selection in protein structure and function prediction is compared with K-nearest neighbor prediction algorithm. The results show that the feature selection method based on nonlinear support vector machine performs best in protein structure prediction, protein solubility prediction, protein molecular chaperone prediction and protein solubility prediction. The accuracy of the selected features is improved by 13.16- 71, especially the k-mer features and PSSM features of proteins. 4. The prediction algorithms in the prediction of protein structure and function are analyzed and compared. In this chapter, seven prediction algorithms, such as linear discriminant analysis (LDA) and principal component analysis (PCA), are selected, and the efficiency of different prediction algorithms in protein structure and function is compared and analyzed. The results show that the SVM prediction algorithm is the best in protein structure class prediction, especially combined with protein PRseAAC features, and the prediction accuracy is 99.15. The molecular chaperones of proteins can be predicted accurately by choosing PCADA,CART,PLSDA,KNN or SVM algorithms, and in the prediction of protein disturbance, the combination of KNN prediction algorithm with protein RCTD features is the best, and the accuracy is 94.75%. In predicting protein solubility, PSSM features should be selected, combined with PLSDA and PCADA prediction algorithms, while the combination of GO feature and CART algorithm or GO feature and PLSDA algorithm can obtain better prediction accuracy when predicting RNA bound proteins.
【學位授予單位】：浙江理工大學
【學位級別】：碩士
【學位授予年份】：2017
【分類號】：Q51

【相似文獻】

相關(guān)期刊論文前10條

1 陳功;周小科;;基于機器學習的miRNA靶基因預(yù)測算法研究概況[J];軟件導(dǎo)刊;2011年12期

2 江禮俐;唐曉峰;唐國安;;結(jié)構(gòu)中不可測區(qū)域振動響應(yīng)的預(yù)測算法[J];上海航天;2006年02期

3 聶書志;葉邦彥;;大規(guī)模數(shù)據(jù)環(huán)境下用電量預(yù)測算法研究[J];科技通報;2013年02期

4 徐軍,向健華,粱昌洪;最大化背景模型用于檢測紅外圖像中的弱小目標[J];光子學報;2002年12期

5 李志俊;蔡黎;宋業(yè)新;張潔;;一種灰色拓撲改進預(yù)測算法及應(yīng)用研究[J];長江大學學報(自科版)理工卷;2007年02期

6 徐海松,葉關(guān)榮;計算機自動配色預(yù)測算法研究[J];光學學報;1996年11期

7 劉平;馬玉韜;孫學宏;張成;杜勇;;基因預(yù)測算法中閾值的傅里葉質(zhì)譜分析[J];湖北農(nóng)業(yè)科學;2014年06期

8 王果;駱曉艷;胡志波;陳素;;基于時序的股票預(yù)測算法研究[J];江蘇技術(shù)師范學院學報;2010年06期

9 潘矜矜;戴憲華;楊小勁;;一種基于卡爾曼濾波修正的LRP信道預(yù)測算法[J];桂林工學院學報;2008年02期

10 王洪,馮嘉禮;基于屬性論方法的股市預(yù)測算法[J];復(fù)旦學報(自然科學版);2004年05期

相關(guān)會議論文前10條

1 朱斌;樊祥;馬東輝;程正東;;窗口大小和權(quán)值模板對固定權(quán)值背景預(yù)測算法的影響[A];2006年全國光電技術(shù)學術(shù)交流會會議文集（D 光電信息處理技術(shù)專題）[C];2006年

2 王峰;姬冰輝;李斗;;一種基于混沌理論的自相似業(yè)務(wù)流預(yù)測算法研究[A];2006北京地區(qū)高校研究生學術(shù)交流會——通信與信息技術(shù)會議論文集（上）[C];2006年

3 錢正祥;徐華;張申浩;;數(shù)字信號序列的向量預(yù)測算法[A];第三屆全國信息獲取與處理學術(shù)會議論文集[C];2005年

4 郭景峰;代軍麗;馬鑫;王娟;;針對通信社會網(wǎng)絡(luò)的時間序列鏈接預(yù)測算法[A];第26屆中國數(shù)據(jù)庫學術(shù)會議論文集（A輯）[C];2009年

5 張利萍;李宏光;;改進的灰色預(yù)測算法在工業(yè)應(yīng)用中的評價[A];第二屆全國信息獲取與處理學術(shù)會議論文集[C];2004年

6 崔冬;;一種改進的LRP信道預(yù)測算法[A];2006通信理論與技術(shù)新進展——第十一屆全國青年通信學術(shù)會議論文集[C];2006年

7 王佳;殷海兵;周冰倩;;一種適合硬件實現(xiàn)的低復(fù)雜度MAD預(yù)測算法[A];浙江省電子學會2011學術(shù)年會論文集[C];2011年

8 鄭銘浩;劉志紅;巫瑞波;徐峻;;P450各亞型代謝調(diào)控劑預(yù)測算法[A];中國化學會第28屆學術(shù)年會第14分會場摘要集[C];2012年

9 張曉丹;王萍;;一種基于特征的H.264的子塊快速幀內(nèi)預(yù)測算法[A];第七屆和諧人機環(huán)境聯(lián)合學術(shù)會議（HHME2011)論文集【oral】[C];2011年

10 劉志紅;鄭銘浩;嚴鑫;巫瑞波;徐峻;;基于結(jié)構(gòu)的化合物穩(wěn)定性預(yù)測算法[A];中國化學會第28屆學術(shù)年會第14分會場摘要集[C];2012年

相關(guān)博士學位論文前2條

1 馬玉韜;基于濾波理論和特征統(tǒng)計的蛋白質(zhì)編碼區(qū)預(yù)測算法研究[D];天津大學;2013年

2 玄萍;MicroRNA識別及其與疾病關(guān)聯(lián)的預(yù)測算法研究[D];哈爾濱工業(yè)大學;2012年

相關(guān)碩士學位論文前10條

1 吳智勇;學術(shù)論文排序預(yù)測算法研究[D];內(nèi)蒙古大學;2015年

2 張勇攀;針對殘缺IP網(wǎng)絡(luò)的鏈路預(yù)測技術(shù)研究[D];哈爾濱工業(yè)大學;2015年

3 應(yīng)超;博物館移動導(dǎo)覽中的遠程展示技術(shù)研究及系統(tǒng)實現(xiàn)[D];浙江大學;2015年

4 常艷華;基于數(shù)據(jù)驅(qū)動模擬電路故障預(yù)測算法實現(xiàn)與軟件開發(fā)[D];電子科技大學;2015年

5 閆青;基于預(yù)測算法的快速多尺度金字塔時空特征點計算算法研究[D];青島科技大學;2016年

6 錢呂見;復(fù)雜網(wǎng)絡(luò)中基于角色傳遞性和對稱性的鏈接預(yù)測算法研究[D];蘭州大學;2016年

7 李小科;無模型自適應(yīng)預(yù)測算法及其在非線性過程控制中的應(yīng)用[D];蘭州大學;2016年

8 周攀;基于姿態(tài)傳感器的人體步態(tài)預(yù)測算法設(shè)計與實現(xiàn)[D];西南交通大學;2016年

9 周真爭;基于社團綜合屬性的鏈路預(yù)測算法研究[D];南京信息工程大學;2016年

10 任程;DSP+FPGA平臺功耗管理的研究與實現(xiàn)[D];哈爾濱工業(yè)大學;2016年

，

本文編號：2427636

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/shoufeilunwen/benkebiyelunwen/2427636.html

上一篇：防毒面具面罩內(nèi)部氣流場分布模擬仿真研究
下一篇：源自Dictyoglomus thermophilum H-6-12的嗜熱β-葡萄糖苷酶Dt0262的構(gòu)建、表達與表征

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

蛋白質(zhì)結(jié)構(gòu)功能預(yù)測中若干關(guān)鍵算法的分析比較