天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁(yè) > 科技論文 > 自動(dòng)化論文 >

基于集成學(xué)習(xí)與柔性神經(jīng)樹(shù)的蛋白質(zhì)翻譯后修飾位點(diǎn)預(yù)測(cè)

發(fā)布時(shí)間:2018-10-22 17:20
【摘要】:蛋白質(zhì)翻譯后修飾在細(xì)胞生命過(guò)程中起到至關(guān)重要的作用,多種蛋白質(zhì)翻譯后修飾相互影響、相互協(xié)調(diào),共同維持、促進(jìn)各種細(xì)胞活動(dòng)的正常進(jìn)行。然而,翻譯后修飾的鑒定在生物學(xué)上往往是繁復(fù)的實(shí)驗(yàn)工作,效率較低。因此,開(kāi)發(fā)有效的生物信息學(xué)預(yù)測(cè)工具來(lái)提高修飾位點(diǎn)鑒定工作的效率勢(shì)在必行。本文以蛋白質(zhì)序列為基本研究對(duì)象,結(jié)合多種特征提取方法,通過(guò)計(jì)算的方法,對(duì)蛋白質(zhì)翻譯后磷酸化修飾和磷酸甘油酯化修飾的修飾位點(diǎn)進(jìn)行了預(yù)測(cè)研究。針對(duì)磷酸化修飾,本文從其修飾的功能出發(fā),從磷酸化修飾數(shù)據(jù)庫(kù)中抽取了多條與信號(hào)傳導(dǎo)功能相關(guān)的蛋白質(zhì)序列,構(gòu)建了數(shù)據(jù)集。在特征提取上,提出了一種新的提取方法,將氨基酸殘基理化性質(zhì)的分組信息融入到以氨基酸殘基在滑窗中出現(xiàn)頻率為基礎(chǔ)的特征提取中。通過(guò)實(shí)驗(yàn)發(fā)現(xiàn),在融合氨基酸殘基理化性質(zhì)分組信息后,同種修飾位點(diǎn)在相同的預(yù)測(cè)模型下,預(yù)測(cè)結(jié)果有了很大的提升。在本文中,利用基于粒子群算法優(yōu)化的神經(jīng)網(wǎng)絡(luò)模型的預(yù)測(cè)準(zhǔn)確率從58%左右提升到86%。本文在此基礎(chǔ)上還圍繞氨基酸殘基序列的大小對(duì)實(shí)驗(yàn)結(jié)果的影響進(jìn)行了初步實(shí)驗(yàn),結(jié)果發(fā)現(xiàn)當(dāng)?shù)鞍踪|(zhì)微序列包含23個(gè)氨基酸殘基時(shí),預(yù)測(cè)結(jié)果達(dá)到最優(yōu)值。之后,本文將數(shù)據(jù)集按照十折交叉驗(yàn)證的方法進(jìn)行整理,利用神經(jīng)網(wǎng)絡(luò)、支持向量機(jī)和柔性神經(jīng)樹(shù)三種模型集成學(xué)習(xí)的方法,按照新的特征提取方法對(duì)數(shù)據(jù)集進(jìn)行實(shí)驗(yàn)。其中三種模型的組合策略按照少數(shù)服從多數(shù)原則進(jìn)行投票。實(shí)驗(yàn)結(jié)果顯示,三種預(yù)測(cè)模型進(jìn)行集成學(xué)習(xí)后,預(yù)測(cè)準(zhǔn)確率可以達(dá)到87.50%,較以前研究結(jié)果有了很大提升。針對(duì)磷酸甘油酯化修飾,本文利用柔性神經(jīng)樹(shù)模型對(duì)這種修飾展開(kāi)預(yù)測(cè)修飾位點(diǎn)的研究工作,并將實(shí)驗(yàn)結(jié)果與本領(lǐng)域最新研究進(jìn)展進(jìn)行了比較。其中,數(shù)據(jù)集通過(guò)十折交叉驗(yàn)證的方式進(jìn)行處理,并且蛋白質(zhì)微序列的窗口值采用了以往研究人員的結(jié)論。實(shí)驗(yàn)結(jié)果顯示,柔性神經(jīng)樹(shù)在等量的正負(fù)樣本下,具有較大的優(yōu)勢(shì),其預(yù)測(cè)準(zhǔn)確率能達(dá)到90%以上,遠(yuǎn)高于先前研究人員發(fā)表的實(shí)驗(yàn)結(jié)果。柔性神經(jīng)樹(shù)預(yù)測(cè)結(jié)果中馬修相關(guān)系數(shù)最高達(dá)到0.807,隨著負(fù)樣本比例的增大,雖然預(yù)測(cè)結(jié)果的準(zhǔn)確率得到提高,但馬修相關(guān)系數(shù)逐漸降低。當(dāng)數(shù)據(jù)集包含全部樣本時(shí),預(yù)測(cè)結(jié)果的馬修相關(guān)系數(shù)為0.326,降低幅度較大,可見(jiàn)正負(fù)樣本數(shù)據(jù)不平衡對(duì)實(shí)驗(yàn)的結(jié)果影響較大。綜上所述,本文在新的特征提取方法上,利用多種預(yù)測(cè)模型集成學(xué)習(xí)進(jìn)行了蛋白質(zhì)磷酸化修飾位點(diǎn)的預(yù)測(cè)工作,且集成后的模型表現(xiàn)良好。同時(shí)本文應(yīng)用柔性神經(jīng)樹(shù)模型進(jìn)行了磷酸甘油酯化修飾位點(diǎn)預(yù)測(cè)的研究,與最新的研究結(jié)果相比,該模型較大幅度的提升了預(yù)測(cè)性能。
[Abstract]:Posttranslational modification of proteins plays an important role in the process of cell life. Many kinds of post-translational modification of proteins interact with each other, coordinate with each other, maintain together, and promote the normal development of various cell activities. However, the identification of post-translational modification is often a complicated experiment in biology, and its efficiency is low. Therefore, it is imperative to develop effective bioinformatics prediction tools to improve the efficiency of the identification of modified sites. In this paper, the protein sequence is taken as the basic research object, combining with many methods of feature extraction, the modified sites of post-translational phosphorylation and glycerol phosphate modification of proteins are predicted by means of calculation. According to the function of phosphorylation modification, several protein sequences related to signal transduction function were extracted from the phosphorylation modification database, and the data set was constructed. In feature extraction, a new extraction method is proposed, in which the grouping information of the physical and chemical properties of amino acid residues is incorporated into the feature extraction based on the frequency of amino acid residues appearing in the sliding window. It was found by experiments that the homologous modified sites improved greatly under the same prediction model after the fusion of amino acid residues' physical and chemical properties. In this paper, the prediction accuracy of neural network model based on particle swarm optimization is improved from about 58% to 86%. On this basis, the influence of the size of amino acid residues on the experimental results is also studied. The results show that when the protein microsequences contain 23 amino acid residues, the predicted results reach the optimal value. After that, the data set is sorted out according to the method of ten fold cross validation, and the data set is tested according to the new feature extraction method using three integrated learning methods: neural network, support vector machine and flexible neural tree. The combination strategies of three models are voted according to the majority principle. The experimental results show that the prediction accuracy can reach 87.50 after the integration learning of the three prediction models, which is greatly improved compared with the previous research results. In this paper, a flexible neural tree model was used to predict the modification sites of glycerol phosphate, and the experimental results were compared with the latest research progress in this field. The data sets are processed by 10% cross-validation, and the window values of protein microsequences are based on previous researchers' conclusions. The experimental results show that the flexible neural tree has a great advantage in the same number of positive and negative samples, and its prediction accuracy can reach more than 90%, which is much higher than the experimental results published by previous researchers. The Mathieu correlation coefficient is the highest 0.807 in the prediction results of the flexible neural tree. With the increase of the negative sample ratio, the accuracy of the prediction results is improved, but the Mathieu correlation coefficient decreases gradually. When the data set contains all the samples, the Mathieu correlation coefficient of the predicted results is 0.326, which decreases greatly. It can be seen that the imbalance of the positive and negative sample data has a great influence on the experimental results. In conclusion, in the new feature extraction method, we use a variety of predictive model ensemble learning to predict protein phosphorylation modified sites, and the integrated model performs well. At the same time, the prediction of the modified sites of glycerol phosphate was studied by using the flexible neural tree model. Compared with the latest research results, the prediction performance of the model was greatly improved.
【學(xué)位授予單位】:濟(jì)南大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:Q51;TP18

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 李偉哲;王洪巖;杜海寧;;非組蛋白甲基化修飾的研究進(jìn)展[J];生物化學(xué)與生物物理進(jìn)展;2015年11期

2 呂斌娜;梁文星;;蛋白質(zhì)乙酰化修飾研究進(jìn)展[J];生物技術(shù)通報(bào);2015年04期

3 邱望仁;鄒國(guó)英;查娟娟;霍立田;;蛋白質(zhì)翻譯后修飾研究概述[J];景德鎮(zhèn)學(xué)院學(xué)報(bào);2014年06期

4 王偉;何華勤;;基于LibSVM的CKSAAP蛋白特征提取預(yù)測(cè)水稻蛋白質(zhì)磷酸化位點(diǎn)[J];湖北科技學(xué)院學(xué)報(bào);2014年07期

5 阮班軍;代鵬;王偉;孫建斌;張文濤;顏真;楊靜華;;蛋白質(zhì)翻譯后修飾研究進(jìn)展[J];中國(guó)細(xì)胞生物學(xué)學(xué)報(bào);2014年07期

6 黃淑云;;決策樹(shù)算法預(yù)測(cè)人類病毒的蛋白質(zhì)磷酸化位點(diǎn)[J];萍鄉(xiāng)高等?茖W(xué)校學(xué)報(bào);2013年03期

7 梁前進(jìn);王鵬程;白燕榮;;蛋白質(zhì)磷酸化修飾研究進(jìn)展[J];科技導(dǎo)報(bào);2012年31期

8 黃秀;陳月輝;曹毅;;基于柔性神經(jīng)樹(shù)的蛋白質(zhì)結(jié)構(gòu)預(yù)測(cè)[J];計(jì)算機(jī)工程;2011年01期

9 姜錚;王芳;何湘;劉大偉;陳宣男;趙紅慶;黃留玉;袁靜;;蛋白質(zhì)磷酸化修飾的研究進(jìn)展[J];生物技術(shù)通訊;2009年02期

10 阮曉鋼,孫海軍;編碼方式對(duì)蛋白質(zhì)二級(jí)結(jié)構(gòu)預(yù)測(cè)精度的影響[J];北京工業(yè)大學(xué)學(xué)報(bào);2005年03期

相關(guān)博士學(xué)位論文 前2條

1 李晶;綜合蛋白質(zhì)翻譯后修飾數(shù)據(jù)庫(kù)的構(gòu)建和分析[D];華中科技大學(xué);2014年

2 涂娟娟;PSO優(yōu)化神經(jīng)網(wǎng)絡(luò)算法的研究及其應(yīng)用[D];江蘇大學(xué);2013年

相關(guān)碩士學(xué)位論文 前4條

1 陳祥;蛋白質(zhì)翻譯后修飾位點(diǎn)識(shí)別研究和應(yīng)用[D];南昌大學(xué);2014年

2 索生寶;蛋白質(zhì)翻譯后修飾位點(diǎn)預(yù)測(cè)及其功能分析[D];南昌大學(xué);2013年

3 蔣君寶;基于序列多信息融合的蛋白質(zhì)亞細(xì)胞定位預(yù)測(cè)方法研究[D];湖南大學(xué);2011年

4 劉欽鋒;蛋白質(zhì)序列編碼與功能預(yù)測(cè)[D];湖南大學(xué);2011年



本文編號(hào):2287808

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/2287808.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶496f8***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com