基于機(jī)器學(xué)習(xí)的凋亡蛋白亞細(xì)胞定位預(yù)測研究
發(fā)布時(shí)間:2018-07-01 11:15
本文選題:凋亡蛋白 + 亞細(xì)胞定位; 參考:《鄭州輕工業(yè)學(xué)院》2017年碩士論文
【摘要】:細(xì)胞凋亡,或被稱為程序性細(xì)胞死亡,是細(xì)胞生命的最后階段,是許多生物過程的重要組成部分,保持生物組織平衡中起著重要作用。凋亡蛋白在程序性細(xì)胞死亡機(jī)制中扮演關(guān)鍵角色,獲取細(xì)胞凋亡蛋白的亞細(xì)胞位置的信息有助于我們了解細(xì)胞凋亡機(jī)制。隨著蛋白質(zhì)呈現(xiàn)指數(shù)級增長,通過生物實(shí)驗(yàn)注釋的方法不能滿足研究者的需求,越來越多的研究人員選擇基于機(jī)器學(xué)習(xí)的方法進(jìn)行蛋白質(zhì)亞細(xì)胞定位預(yù)測。本文主要使用機(jī)器學(xué)習(xí)的方法對凋亡蛋白亞細(xì)胞定位預(yù)測展開深入的研究,完成以下三方面工作:(1)針對基于序列信息的特征提取方法已不能進(jìn)一步提升預(yù)測效果的問題,本文使用蛋白質(zhì)及其同源蛋白的GO注釋信息代替序列信息來表示蛋白質(zhì)。實(shí)驗(yàn)結(jié)果表明本文所提的方法對凋亡蛋白質(zhì)的亞細(xì)胞位置預(yù)測性能顯著地超過了其他已有的方法。為了給更多的研究人員提供預(yù)測服務(wù),本文構(gòu)建了在線預(yù)測網(wǎng)站。(2)CL317凋亡蛋白數(shù)據(jù)集存在較為嚴(yán)重的不均衡分布問題。在機(jī)器學(xué)習(xí)領(lǐng)域的以往的研究表明,直接應(yīng)用傳統(tǒng)的機(jī)器學(xué)習(xí)算法往往會(huì)導(dǎo)致偏向多數(shù)類,從而導(dǎo)致在少數(shù)類上分類性能不佳。為了解決這一問題,本文構(gòu)建了一種新的凋亡蛋白亞細(xì)胞定位預(yù)測器GOIL-Apo,將隨機(jī)欠采樣技術(shù)與多類支持向量機(jī)相結(jié)合提出了欠采樣SVMs集成分類器以解決CL317數(shù)據(jù)集存在的不平衡問題,同時(shí)通過構(gòu)建GO向量子空間的方法避免使用所有GO術(shù)語所帶來的維度災(zāi)難問題。實(shí)驗(yàn)結(jié)果表明解決不平衡問題能有效地提升預(yù)測效果,而且預(yù)測性能顯著地超過了其他已有的方法。(3)以往研究人員只專注于單定位點(diǎn)的凋亡蛋白質(zhì)亞細(xì)胞位置預(yù)測而忽略了多位點(diǎn)的凋亡蛋白,本文更進(jìn)一步研究多位點(diǎn)的凋亡蛋白亞細(xì)胞位置預(yù)測,構(gòu)建了一個(gè)包含多亞細(xì)胞位置的凋亡蛋白數(shù)據(jù)集,提出了一種新的利用標(biāo)記相關(guān)特征的多標(biāo)記算法。實(shí)驗(yàn)結(jié)果表明,通過選取與每個(gè)位置最相關(guān)的特征,能夠很好地建模蛋白質(zhì)的多位置特性,并且取得了很好的性能。本文研究是該領(lǐng)域的第一個(gè)考慮多位置細(xì)胞凋亡蛋白的工作,為多位置細(xì)胞凋亡蛋白預(yù)測研究提供了重要的參考價(jià)值。
[Abstract]:Apoptosis, or programmed cell death, is the final stage of cell life and an important component of many biological processes, and plays an important role in maintaining the balance of biological tissues. Apoptosis proteins play a key role in the mechanism of programmed cell death. Obtaining the information of the subcellular location of apoptotic proteins helps us to understand the mechanism of apoptosis. With the exponential growth of protein, the method of biological experiment annotation can not meet the needs of researchers. More and more researchers choose the method based on machine learning to predict the subcellular localization of protein. In this paper, we mainly use machine learning method to study the subcellular localization prediction of apoptotic protein, and accomplish the following three aspects: (1) aiming at the problem that the feature extraction method based on sequence information can not further improve the prediction effect, In this paper, go annotation information of proteins and their homologous proteins is used to represent proteins instead of sequence information. The experimental results show that the proposed method can predict the subcellular location of apoptotic proteins significantly better than other existing methods. In order to provide prediction services for more researchers, an online prediction website is constructed. (2) the CL317 apoptotic protein data set has a serious problem of uneven distribution. Previous studies in the field of machine learning have shown that direct application of traditional machine learning algorithms often leads to skewed majority classes and poor classification performance on a few classes. In order to solve this problem, In this paper, a new apoptotic protein subcellular localization predictor (GOIL-Apo) is constructed. Combining random under-sampling technique with multi-class support vector machines, an integrated classifier for under-sampled SVMs is proposed to solve the unbalanced problem in CL317 data set. At the same time, the dimensionality disaster caused by the use of all go terms is avoided by constructing go quantum space. The experimental results show that solving the imbalance problem can effectively improve the prediction effect. Moreover, the predictive performance was significantly better than other existing methods. (3) previously, researchers only focused on the location prediction of apoptotic protein subcells at single locus and neglected the multilocus apoptotic protein. In this paper, we further study the multilocus prediction of apoptotic protein subcellular location, construct a multilocus of apoptotic protein data set, and propose a new multi-marker algorithm based on the characteristics of marker correlation. The experimental results show that the multi-position characteristics of proteins can be well modeled by selecting the features most relevant to each location, and good performance has been achieved. This paper is the first work in this field to consider multisite apoptotic proteins, which provides an important reference value for the prediction of multisite apoptotic proteins.
【學(xué)位授予單位】:鄭州輕工業(yè)學(xué)院
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:Q25;TP181
【參考文獻(xiàn)】
相關(guān)期刊論文 前6條
1 程昔恩;吳志誠;;一種新的蛋白質(zhì)亞細(xì)胞定位預(yù)測方法[J];計(jì)算機(jī)工程與應(yīng)用;2012年06期
2 趙禹;趙巨東;姚龍;;用離散增量結(jié)合支持向量機(jī)方法預(yù)測蛋白質(zhì)亞細(xì)胞定位[J];生物信息學(xué);2010年03期
3 張松;夏學(xué)峰;沈金城;孫之榮;;基于序列保守性和蛋白質(zhì)相互作用的真核蛋白質(zhì)亞細(xì)胞定位預(yù)測[J];生物化學(xué)與生物物理進(jìn)展;2008年05期
4 李鳳敏;李前忠;林昊;;基于離散增量和協(xié)變判別函數(shù)識別蛋白質(zhì)亞核定位[J];內(nèi)蒙古大學(xué)學(xué)報(bào)(自然科學(xué)版);2008年01期
5 李超;伏圣博;劉華玲;馬欣榮;;細(xì)胞凋亡研究進(jìn)展[J];世界科技研究與發(fā)展;2007年03期
6 陳穎麗,李前忠;用離散量方法預(yù)測細(xì)胞凋亡蛋白的亞細(xì)胞位置[J];內(nèi)蒙古大學(xué)學(xué)報(bào)(自然科學(xué)版);2004年04期
,本文編號:2087600
本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/2087600.html
最近更新
教材專著