蛋白質(zhì)結(jié)合位點(diǎn)預(yù)測(cè)方法研究與應(yīng)用
發(fā)布時(shí)間:2018-03-12 14:46
本文選題:蛋白質(zhì)結(jié)合位點(diǎn) 切入點(diǎn):氨基酸組成偏好 出處:《大連理工大學(xué)》2012年博士論文 論文類型:學(xué)位論文
【摘要】:生物分子和很多其它有機(jī)配體能夠與蛋白質(zhì)在其表面特定位點(diǎn)高度親和結(jié)合。如何區(qū)分這樣的結(jié)合位點(diǎn)與蛋白質(zhì)其它表面區(qū)域,這個(gè)問題是蛋白質(zhì)研究領(lǐng)域的前沿課題。近些年來,在蛋白質(zhì)分子表面上預(yù)測(cè)可能結(jié)合區(qū)域的潛在價(jià)值越來越重要。隨著生物學(xué)和醫(yī)學(xué)中重要蛋白質(zhì)的結(jié)構(gòu)知識(shí)的不斷增長(zhǎng),這樣的預(yù)測(cè)方法變得更加實(shí)用化。它能夠?yàn)楹侠硭幬锓肿釉O(shè)計(jì)提供幫助,同時(shí)也可以揭示蛋白質(zhì)分子功能。對(duì)于功能預(yù)測(cè)和合理藥物設(shè)計(jì)兩方面的應(yīng)用,都需要一個(gè)可靠的蛋白質(zhì).配體結(jié)合位點(diǎn)識(shí)別和定義方法。在蛋白質(zhì)復(fù)合體三維結(jié)構(gòu)已知的情況下,就可以對(duì)蛋白質(zhì).蛋白質(zhì)相互作用界面以及蛋白質(zhì).配體結(jié)合面做關(guān)于氨基酸分布和物理化學(xué)特征的系統(tǒng)分析,這使得活性位點(diǎn)的識(shí)別成為可能。已經(jīng)有很多計(jì)算方法被開發(fā)出來,利用這些信息預(yù)測(cè)蛋白質(zhì)可能的結(jié)合位點(diǎn)。但是,目前的方法在預(yù)測(cè)精度和效率上仍然存在不足,所以需要進(jìn)一步研究結(jié)合位點(diǎn)預(yù)測(cè)方法以提高其預(yù)測(cè)能力,揭示其關(guān)鍵影響因素。 本文研究蛋白質(zhì)結(jié)合位點(diǎn)的預(yù)測(cè)方法,主要包括四個(gè)部分。 第一章,首先描述了蛋白質(zhì)-配體相互作用原理,包括熱力學(xué)理論、結(jié)合過程理論模型和物理學(xué)性質(zhì)。然后,概述了蛋白質(zhì)結(jié)合位點(diǎn)預(yù)測(cè)研究現(xiàn)狀,包括蛋白質(zhì).配體結(jié)合位點(diǎn)預(yù)測(cè)和蛋白質(zhì).蛋白質(zhì)結(jié)合位點(diǎn)預(yù)測(cè)兩個(gè)方面內(nèi)容。最后,簡(jiǎn)要介紹了本文主要工作內(nèi)容以及取得的結(jié)果。 第二章,提出了兩種新的氨基酸組成偏好表示模型,分別以原子和原子接觸對(duì)作為統(tǒng)計(jì)對(duì)象,區(qū)別于傳統(tǒng)使用殘基作為統(tǒng)計(jì)對(duì)象的模型;谌挚诖玫呐潴w結(jié)合口袋識(shí)別方法測(cè)試結(jié)果顯示,基于原子和基于原子接觸對(duì)模型要優(yōu)于基于殘基的模型。由于結(jié)合位點(diǎn)上存在所謂熱點(diǎn)區(qū)域,我們定義偏好值最大的局部區(qū)域作為一個(gè)口袋的熱點(diǎn),這個(gè)局部偏好值代表整個(gè)口袋的偏好屬性,再結(jié)合口袋大小屬性形成了基于局部口袋偏好的配體結(jié)合口袋識(shí)別方法。結(jié)果分析顯示,這兩個(gè)屬性能夠相互促進(jìn)、極大提高識(shí)別能力;與文獻(xiàn)上發(fā)表的一些預(yù)測(cè)方法比較,我們的方法取得了相當(dāng)?shù)臏?zhǔn)確率并具有計(jì)算簡(jiǎn)單的優(yōu)點(diǎn)。 第三章,基于蛋白質(zhì)-配體結(jié)合位點(diǎn)與蛋白質(zhì)-蛋白質(zhì)結(jié)合位點(diǎn)在幾何特征和物理化學(xué)性質(zhì)方面的差異,我們分別提出了兩種殘基屬性定義模型,即單塊和多塊殘基屬性定義模型。由殘基屬性定義模型得到的殘基特征,利用隨機(jī)森林算法構(gòu)建了結(jié)合殘基分類預(yù)測(cè)器。另外,我們還提出了一種新的聚類方法用來發(fā)現(xiàn)并預(yù)測(cè)結(jié)合位點(diǎn)。這些方法分別被應(yīng)用于蛋白質(zhì)-配體與蛋白質(zhì)-蛋白質(zhì)結(jié)合殘基的預(yù)測(cè)。采用相同數(shù)據(jù)集及成功標(biāo)準(zhǔn),基于單塊殘基屬性定義模型的隨機(jī)森林分類器在蛋白質(zhì)-配體結(jié)合位點(diǎn)預(yù)測(cè)準(zhǔn)確率方面要優(yōu)于Q-SiteFinder, SCREEN和Morita's method三種方法;同樣,平衡準(zhǔn)確率和CC(Correlation Coefficient)值結(jié)果顯示,基于多塊殘基屬性定義模型的隨機(jī)森林分類器在蛋白質(zhì).蛋白質(zhì)結(jié)合殘基預(yù)測(cè)能力方面優(yōu)于Yan、Wang以及Chen and Jeong的方法;在蛋白質(zhì)-蛋白質(zhì)結(jié)合位點(diǎn)預(yù)測(cè)方面,基于多塊殘基屬性定義模型的預(yù)測(cè)器也都優(yōu)于Bradford and Westhead's method、Bradford and Needham's method和Higa and Tozzi's method。 第四章,把基于隨機(jī)森林的蛋白質(zhì)結(jié)合位點(diǎn)預(yù)測(cè)方法用于輔助分子對(duì)接。對(duì)于蛋白質(zhì)-配體分子對(duì)接,隨機(jī)森林預(yù)測(cè)方法以前端使用方式縮小構(gòu)象搜索空間。對(duì)接結(jié)果表明,該預(yù)測(cè)方法在輔助對(duì)接方面要優(yōu)于流行軟件Accelrys Discovery Studio中的結(jié)合位點(diǎn)預(yù)測(cè)方法。在蛋白質(zhì)-蛋白質(zhì)分子對(duì)接中,隨機(jī)森林預(yù)測(cè)方法按后端使用方式,即作為一種打分函數(shù)用來挑選近自然構(gòu)象,對(duì)接實(shí)驗(yàn)表明,基于預(yù)測(cè)信息設(shè)計(jì)的打分模型在識(shí)別近自然構(gòu)象方面與ZDOCK打分函數(shù)各有優(yōu)勢(shì),有較大的互補(bǔ)性。 論文最后部分對(duì)本文的工作做了總結(jié)并且對(duì)后續(xù)研究進(jìn)行了展望。 本文工作受到國(guó)家自然科學(xué)基金項(xiàng)目“藥物分子優(yōu)化設(shè)計(jì)的網(wǎng)格計(jì)算方法研究(No.10772042)”,國(guó)家863科技計(jì)劃項(xiàng)目“新藥研發(fā)網(wǎng)(No.2006AA01A124)”和《國(guó)家重點(diǎn)基礎(chǔ)研究發(fā)展規(guī)劃》項(xiàng)目“蛋白質(zhì)動(dòng)態(tài)行為和相互作用模擬新方法研究(No.2009CB918501)”的資助。
[Abstract]:Molecular biology and many other organic ligands can bind with high affinity protein on the surface of a specific location. How to distinguish such binding sites and other protein surface area, this problem is a frontier field of protein research. In recent years, the protein molecules on the surface of prediction may be combined with potential value areas continuously along with the more and more important. The growth structure knowledge of important proteins in biology and medicine, this prediction method becomes more practical. It can provide help for rational drug design, but also can reveal the protein molecular function. The function prediction and rational drug design and application of the two aspects, are in need of a reliable protein ligand binding site. The identification and definition method. In protein complexes with known 3D structure case, can the protein protein interaction field. The surface and proteins. Ligand binding surface analysis system on the distribution of amino acid and physical and chemical characteristics, which makes the identification of active sites as possible. There have been many computational methods were developed to predict protein binding sites may use this information. However, the current method in the prediction accuracy and efficiency are still insufficient, so it is necessary to further study of binding site prediction methods to improve the prediction ability, reveals the key influencing factors.
This paper studies the prediction methods of protein binding sites, including four parts.
The first chapter, first described the protein ligand interaction principle, including thermodynamic theory, combined with the process of theoretical model and physics properties. Then, the protein binding site prediction research, including protein ligand binding site prediction and protein. The protein binding site prediction of two aspects. Finally, this paper briefly introduced the main content of the work and the results obtained.
The second chapter puts forward two kinds of new amino acid preference representation model, respectively, and atoms of contact as the statistical object, different from the traditional use of residues as the statistical object model. The ligand binding pocket pocket global preference test results show that the recognition method based on atomic and contact model is better than the model based on residues based on the atom based. Due to the combination of the existence of the so-called hot spots on the site, we define the preference value of local maximum area as a hot pocket, the local preference attribute value preference on behalf of the pocket, and then formed a pocket recognition method based on local preference based on pocket ligand binding pocket size property. The results of analysis showed that the two attributes can promote each other, greatly improve the recognition ability; compared with some published literature on forecasting methods, our method achieved when quasi phase Accuracy and simplicity of calculation.
The third chapter, differences in protein ligand binding sites and protein protein binding sites in the geometric and physical and chemical properties based on, we propose two residue attribute definition model, single and multi block residue attribute definition model. Residue characteristics derived from residues attribute definition model, using the random forest algorithm is constructed with residue classification predictor. In addition, we also propose a new clustering method to discover and predict binding sites. These methods were applied to predict protein-protein and protein ligand binding residues. Using the same data set and success criteria, single residue attribute definition model the random forest classifier in protein ligand binding site prediction accuracy is better than Q-SiteFinder based on SCREEN, Morita's and method three methods; also, the balance of accuracy and CC (Co Rrelation Coefficient) results show that, based on the multi block residue attribute definition model of random forest classifier in protein. Protein binding residues prediction ability is superior to Yan, Wang and Chen and Jeong method; in the protein-protein binding site prediction, based on multi block residue attribute definition model predictor are better than that of Bradford and Westhead's method, Bradford and Needham's method and Higa and Tozzi's method.
The fourth chapter, the random forest protein binding site prediction method for computer-aided molecular docking based for protein ligand docking, the previous prediction method of random forest end using the way to narrow the search space. The conformation of the docking results show that the prediction method to forecast method is better than the popular software Accelrys Discovery binding sites in the Studio in the auxiliary docking area. In protein-protein docking, random forest forecast methods used in the back-end, as a scoring function to select the near natural conformation, docking experiments show that based on the scoring model of information design prediction in recognition of near natural conformation and ZDOCK scoring functions have their own advantages and are complementary.
The last part of the paper makes a summary of the work of this paper and looks forward to the follow-up research.
This study was supported by the method of molecular drug optimization design of grid computing projects of the National Natural Science Foundation (No.10772042) ", 863 national science and technology project" research and development of new drugs network (No.2006AA01A124) "and" national key basic research and development plan > Project "Research on new method for protein interaction and dynamic behavior simulation (No.2009CB918501)" of China.
【學(xué)位授予單位】:大連理工大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2012
【分類號(hào)】:R341
【參考文獻(xiàn)】
相關(guān)期刊論文 前2條
1 張光亞;方柏山;;基于氨基酸組成分布的嗜熱和嗜冷蛋白隨機(jī)森林分類模型[J];生物工程學(xué)報(bào);2008年02期
2 武曉巖;李康;;隨機(jī)森林方法在基因表達(dá)數(shù)據(jù)分析中的應(yīng)用及研究進(jìn)展[J];中國(guó)衛(wèi)生統(tǒng)計(jì);2009年04期
,本文編號(hào):1602028
本文鏈接:http://sikaile.net/xiyixuelunwen/1602028.html
最近更新
教材專著