基于集成學習與柔性神經(jīng)樹的蛋白質翻譯后修飾位點預測
[Abstract]:Posttranslational modification of proteins plays an important role in the process of cell life. Many kinds of post-translational modification of proteins interact with each other, coordinate with each other, maintain together, and promote the normal development of various cell activities. However, the identification of post-translational modification is often a complicated experiment in biology, and its efficiency is low. Therefore, it is imperative to develop effective bioinformatics prediction tools to improve the efficiency of the identification of modified sites. In this paper, the protein sequence is taken as the basic research object, combining with many methods of feature extraction, the modified sites of post-translational phosphorylation and glycerol phosphate modification of proteins are predicted by means of calculation. According to the function of phosphorylation modification, several protein sequences related to signal transduction function were extracted from the phosphorylation modification database, and the data set was constructed. In feature extraction, a new extraction method is proposed, in which the grouping information of the physical and chemical properties of amino acid residues is incorporated into the feature extraction based on the frequency of amino acid residues appearing in the sliding window. It was found by experiments that the homologous modified sites improved greatly under the same prediction model after the fusion of amino acid residues' physical and chemical properties. In this paper, the prediction accuracy of neural network model based on particle swarm optimization is improved from about 58% to 86%. On this basis, the influence of the size of amino acid residues on the experimental results is also studied. The results show that when the protein microsequences contain 23 amino acid residues, the predicted results reach the optimal value. After that, the data set is sorted out according to the method of ten fold cross validation, and the data set is tested according to the new feature extraction method using three integrated learning methods: neural network, support vector machine and flexible neural tree. The combination strategies of three models are voted according to the majority principle. The experimental results show that the prediction accuracy can reach 87.50 after the integration learning of the three prediction models, which is greatly improved compared with the previous research results. In this paper, a flexible neural tree model was used to predict the modification sites of glycerol phosphate, and the experimental results were compared with the latest research progress in this field. The data sets are processed by 10% cross-validation, and the window values of protein microsequences are based on previous researchers' conclusions. The experimental results show that the flexible neural tree has a great advantage in the same number of positive and negative samples, and its prediction accuracy can reach more than 90%, which is much higher than the experimental results published by previous researchers. The Mathieu correlation coefficient is the highest 0.807 in the prediction results of the flexible neural tree. With the increase of the negative sample ratio, the accuracy of the prediction results is improved, but the Mathieu correlation coefficient decreases gradually. When the data set contains all the samples, the Mathieu correlation coefficient of the predicted results is 0.326, which decreases greatly. It can be seen that the imbalance of the positive and negative sample data has a great influence on the experimental results. In conclusion, in the new feature extraction method, we use a variety of predictive model ensemble learning to predict protein phosphorylation modified sites, and the integrated model performs well. At the same time, the prediction of the modified sites of glycerol phosphate was studied by using the flexible neural tree model. Compared with the latest research results, the prediction performance of the model was greatly improved.
【學位授予單位】:濟南大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:Q51;TP18
【參考文獻】
相關期刊論文 前10條
1 李偉哲;王洪巖;杜海寧;;非組蛋白甲基化修飾的研究進展[J];生物化學與生物物理進展;2015年11期
2 呂斌娜;梁文星;;蛋白質乙;揎椦芯窟M展[J];生物技術通報;2015年04期
3 邱望仁;鄒國英;查娟娟;霍立田;;蛋白質翻譯后修飾研究概述[J];景德鎮(zhèn)學院學報;2014年06期
4 王偉;何華勤;;基于LibSVM的CKSAAP蛋白特征提取預測水稻蛋白質磷酸化位點[J];湖北科技學院學報;2014年07期
5 阮班軍;代鵬;王偉;孫建斌;張文濤;顏真;楊靜華;;蛋白質翻譯后修飾研究進展[J];中國細胞生物學學報;2014年07期
6 黃淑云;;決策樹算法預測人類病毒的蛋白質磷酸化位點[J];萍鄉(xiāng)高等專科學校學報;2013年03期
7 梁前進;王鵬程;白燕榮;;蛋白質磷酸化修飾研究進展[J];科技導報;2012年31期
8 黃秀;陳月輝;曹毅;;基于柔性神經(jīng)樹的蛋白質結構預測[J];計算機工程;2011年01期
9 姜錚;王芳;何湘;劉大偉;陳宣男;趙紅慶;黃留玉;袁靜;;蛋白質磷酸化修飾的研究進展[J];生物技術通訊;2009年02期
10 阮曉鋼,孫海軍;編碼方式對蛋白質二級結構預測精度的影響[J];北京工業(yè)大學學報;2005年03期
相關博士學位論文 前2條
1 李晶;綜合蛋白質翻譯后修飾數(shù)據(jù)庫的構建和分析[D];華中科技大學;2014年
2 涂娟娟;PSO優(yōu)化神經(jīng)網(wǎng)絡算法的研究及其應用[D];江蘇大學;2013年
相關碩士學位論文 前4條
1 陳祥;蛋白質翻譯后修飾位點識別研究和應用[D];南昌大學;2014年
2 索生寶;蛋白質翻譯后修飾位點預測及其功能分析[D];南昌大學;2013年
3 蔣君寶;基于序列多信息融合的蛋白質亞細胞定位預測方法研究[D];湖南大學;2011年
4 劉欽鋒;蛋白質序列編碼與功能預測[D];湖南大學;2011年
,本文編號:2287808
本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/2287808.html