天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁(yè) > 科技論文 > 自動(dòng)化論文 >

基于SVM和蛋白功能注釋的蛋白質(zhì)相互作用關(guān)系預(yù)測(cè)方法研究

發(fā)布時(shí)間:2018-03-13 21:25

  本文選題:蛋白-蛋白互作關(guān)系 切入點(diǎn):SVM 出處:《吉林大學(xué)》2017年碩士論文 論文類型:學(xué)位論文


【摘要】:兩個(gè)或更多個(gè)以上的蛋白分子復(fù)合物通過(guò)理化方式逐漸形成蛋白復(fù)合體(protein complex)的過(guò)程就是PPI互作關(guān)系(protein-protein interaction,PPI)的過(guò)程。本文研究發(fā)現(xiàn),目前PPI數(shù)據(jù)中包含的蛋白質(zhì)個(gè)數(shù)太少不能滿足生命科學(xué)實(shí)際應(yīng)用的需求,如常見(jiàn)的DIP等PPI互作數(shù)據(jù)庫(kù)中只包含人類的9000多個(gè)PPI互作關(guān)系,而常用的基因表達(dá)數(shù)據(jù)個(gè)數(shù)在1~2萬(wàn)左右。例如常用的差異表達(dá)分析,發(fā)現(xiàn)很多差異表達(dá)基因并不包含在已知的PPI當(dāng)中,因此大量蛋白質(zhì)間的互作關(guān)系有待預(yù)測(cè),F(xiàn)有的PPI數(shù)據(jù)主要是通過(guò)實(shí)驗(yàn)的手段獲得,包括串聯(lián)親和純化和酵母雙雜交等技術(shù),這些實(shí)驗(yàn)雖然可以獲得較高的精度但是耗時(shí)太長(zhǎng),大大降低實(shí)驗(yàn)成本縮短耗時(shí)可用計(jì)算生物學(xué)的手段輔助預(yù)測(cè)PPI互作關(guān)系?傮w說(shuō)來(lái)基于機(jī)器學(xué)習(xí)的算法預(yù)測(cè)蛋白間互作關(guān)系效果比較不錯(cuò),但是也有其自身的約束,主要體現(xiàn)在:第一、預(yù)測(cè)PPI互作關(guān)系應(yīng)用的機(jī)器學(xué)習(xí)算法帶有監(jiān)督作用,訓(xùn)練測(cè)試PPI互作關(guān)系數(shù)據(jù)集是該算法不和缺少的,然而已知的和未知的PPI關(guān)系都比較少,尤其是確定不存在互作關(guān)系的蛋白質(zhì)更是較少;第二、向量特征表示方法單一,或者是基于PPI氨酸序列的方法或者是基于基因共表達(dá)的方法等,沒(méi)有考慮與PPI自身相關(guān)的其他生物學(xué)信息;第三、計(jì)算量比較大。針對(duì)以上問(wèn)題,本文提出的解決方案:(1)針對(duì)SVM特征向量的表示的問(wèn)題本文處理使用氨基酸AC值,還將GO,KEGG等蛋白功能注釋數(shù)據(jù)引入特征向量的構(gòu)建中,從而構(gòu)建新的特征向量。(2)把基于實(shí)驗(yàn)得到的存在相互作用關(guān)系的PPI作為算法的正訓(xùn)練數(shù)據(jù)集,并通過(guò)網(wǎng)絡(luò)搜索找到當(dāng)前實(shí)驗(yàn)或者計(jì)算手段得到的不存在相互作用關(guān)系的PPI作為算法的負(fù)訓(xùn)練數(shù)據(jù)集,用正反兩方面的PPI數(shù)據(jù)集訓(xùn)練測(cè)試SVM來(lái)預(yù)測(cè)PPI互作關(guān)系。(3)設(shè)計(jì)并實(shí)現(xiàn)基于計(jì)算手段輔助預(yù)測(cè)PPI的算法——PPI_SPFA算法,針對(duì)計(jì)算量大的問(wèn)題本文提出使用兩步計(jì)算的策略,即對(duì)那些存在相互作用關(guān)系可能性很小的PPI進(jìn)行過(guò)濾,然后再進(jìn)行預(yù)測(cè)的手段。PPI_SPFA算法與PPI_AC和i PPI-Esml等其他算法相比,其預(yù)測(cè)PPI相互作用關(guān)系的精度有了提高。(4)除了現(xiàn)有的DIP等PPI互作數(shù)據(jù)庫(kù)中已包含的PPI互作關(guān)系外,SVM對(duì)剩余的所有PPI互作關(guān)系進(jìn)行預(yù)測(cè),最終構(gòu)造出一個(gè)比較全的PPI互作關(guān)系網(wǎng)絡(luò)。今后研究重點(diǎn)便是結(jié)合SVM和蛋白功能注釋GO、KEGG等對(duì)PPI預(yù)測(cè)算法進(jìn)行探究和創(chuàng)新,改善PPI預(yù)測(cè)算法的準(zhǔn)確度和響應(yīng)速度等。
[Abstract]:The process of two or more protein complexes gradually forming protein complex by physicochemical means is the process of protein-protein interaction (PPI). At present, the number of proteins contained in PPI data is too small to meet the needs of practical applications in life sciences. For example, the common PPI interaction databases such as DIP contain only more than 9,000 PPI interactions of human beings. The number of commonly used gene expression data is about 1 ~ 20,000. For example, the commonly used differential expression analysis shows that many differentially expressed genes are not included in known PPI. Therefore, the interactions between proteins need to be predicted. The existing PPI data are mainly obtained through experiments, including tandem affinity purification and yeast two-hybrid techniques, which can achieve high accuracy but take too long. In general, the algorithm based on machine learning is quite effective in predicting the interaction between proteins, but it also has its own constraints. The main results are as follows: first, the machine learning algorithm used to predict the PPI interaction relationship has the function of supervision, and the training and testing of the PPI interaction relation data set is the lack of the algorithm. However, there are few known and unknown PPI relations. In particular, there are fewer proteins that determine that there is no interaction; second, the method of vector feature representation is single, either based on PPI amino acid sequence or based on gene coexpression, etc. No other biological information related to PPI itself is taken into account; third, the amount of calculation is relatively large. In view of the above problem, the solution proposed in this paper is to solve the problem of representation of the SVM eigenvector. This paper deals with the use of Amino Acid AC value. In addition, the functional annotated data of proteins such as GogokEGG are introduced into the construction of feature vectors to construct a new feature vector. The experimental PPI with interaction relation is used as the positive training data set of the algorithm. And through the network search to find the current experimental or computational means of the non-interactive PPI as the algorithm of the negative training data set, Using the PPI data set training test SVM to predict the PPI interaction relation, we design and implement the algorithm based on computational means to assist the prediction of PPI. This paper proposes a two-step calculation strategy for the problem of large computational complexity. That is, filter the PPI with little possibility of interaction and then predict it. Compared with other algorithms, such as PPI_AC and I PPI-Esml, The precision of predicting PPI interaction relation has been improved. 4) in addition to the existing PPI interaction relation which has been included in the existing PPI interaction database, the PPI interaction relation is predicted for all the remaining PPI interactions. Finally, a complete PPI interaction network is constructed. The emphasis of future research is to explore and innovate the PPI prediction algorithm with SVM and protein function annotation, so as to improve the accuracy and response speed of PPI prediction algorithm.
【學(xué)位授予單位】:吉林大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:Q51;TP181

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 尹穩(wěn);伏旭;李平;;蛋白質(zhì)組學(xué)的應(yīng)用研究進(jìn)展[J];生物技術(shù)通報(bào);2014年01期

2 王明強(qiáng);武金霞;張玉紅;韓凝;邊紅武;朱睦元;;蛋白質(zhì)相互作用實(shí)驗(yàn)技術(shù)的最新進(jìn)展[J];遺傳;2013年11期

3 Luis Torgo;李洪成;陳道輪;吳立明;;數(shù)據(jù)挖掘與R語(yǔ)言[J];計(jì)算機(jī)教育;2013年09期

4 沈瑤瑤;嚴(yán)慶豐;;蛋白質(zhì)相互作用研究進(jìn)展[J];生命科學(xué);2013年03期

5 劉勇;廖士中;;基于支持向量機(jī)泛化誤差界的多核學(xué)習(xí)方法[J];武漢大學(xué)學(xué)報(bào)(理學(xué)版);2012年02期

6 王英超;黨源;李曉艷;王興龍;;蛋白質(zhì)組學(xué)及其技術(shù)發(fā)展[J];生物技術(shù)通訊;2010年01期

7 孫平;張逢春;張影;;蛋白質(zhì)芯片技術(shù)的研究及應(yīng)用現(xiàn)狀[J];北華大學(xué)學(xué)報(bào)(自然科學(xué)版);2009年02期

8 武君;沈秀張;林吉進(jìn);;蛋白質(zhì)與蛋白質(zhì)相互作用研究技術(shù)[J];中國(guó)分子心臟病學(xué)雜志;2008年02期

9 余鑫煜;許正平;;蛋白質(zhì)相互作用數(shù)據(jù)庫(kù)及其應(yīng)用[J];中國(guó)生物化學(xué)與分子生物學(xué)報(bào);2008年03期

10 何艷頻;孫愛(ài)峰;;Spearman等級(jí)相關(guān)系數(shù)計(jì)算公式及其相互關(guān)系的探討[J];中國(guó)現(xiàn)代藥物應(yīng)用;2007年07期

相關(guān)博士學(xué)位論文 前2條

1 龔偉;基于信息熵和互信息的流域水文模型不確定性分析[D];清華大學(xué);2012年

2 史明光;蛋白質(zhì)相互作用預(yù)測(cè)方法的研究[D];中國(guó)科學(xué)技術(shù)大學(xué);2009年

,

本文編號(hào):1608187

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/1608187.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶9306c***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com