基于結(jié)構(gòu)數(shù)據(jù)的轉(zhuǎn)錄因子結(jié)合位點分析
發(fā)布時間:2018-04-16 16:23
本文選題:基因調(diào)控 + 蛋白-核酸復(fù)合物 ; 參考:《東南大學(xué)》2005年碩士論文
【摘要】: 基因是遺傳信息的物理和功能單位,基因功能的體現(xiàn)取決于其結(jié)構(gòu)和表達(dá)調(diào)控狀況;虮磉_(dá)的調(diào)控就是把這些基因結(jié)構(gòu)變換成多種多樣基因功能的操作者。因此基因表達(dá)調(diào)控研究對揭示生命的奧秘具有重大意義。mRNA轉(zhuǎn)錄起始調(diào)控是調(diào)控的基本控制點,也是最重要的一環(huán),其實質(zhì)是轉(zhuǎn)錄因子結(jié)合相應(yīng)的調(diào)控元件,影響了RNA聚合酶的活性,從而影響了基因的轉(zhuǎn)錄水平。針對目前國際上從結(jié)構(gòu)角度研究較少的情況,本文從蛋白-核酸復(fù)合物的結(jié)構(gòu)數(shù)據(jù)出發(fā),分析氨基酸-堿基作用對,探索轉(zhuǎn)錄因子結(jié)合位點的預(yù)測方法。 本文從PDB大分子結(jié)構(gòu)數(shù)據(jù)庫中查尋出所有記錄的蛋白-核酸復(fù)合物。利用復(fù)合物作用力計算軟件,對這些蛋白-核酸復(fù)合物的三維空間結(jié)構(gòu)數(shù)據(jù)進(jìn)行處理,得到復(fù)合物中可能存在的氨基酸側(cè)鏈同核酸之間的作用對。然后依據(jù)SWISSPROT數(shù)據(jù)庫對蛋白的注釋,分成與調(diào)控過程相關(guān)的復(fù)合物集合和與調(diào)控?zé)o關(guān)的復(fù)合物集合。我們對蛋白-核酸復(fù)合物中的氨基酸側(cè)鏈同核酸之間的作用對(包括氫鍵跟非鍵作用)進(jìn)行統(tǒng)計分析。通過分析DNA跟轉(zhuǎn)錄因子的氨基酸殘基作用的局部環(huán)境信息,發(fā)現(xiàn)一些三聯(lián)或者五聯(lián)殘基片段總是結(jié)合DNA,因此我們提出猜想:在轉(zhuǎn)錄相關(guān)的蛋白-核酸復(fù)合物中,氨基酸殘基環(huán)境或者堿基環(huán)境在一定程度上決定了中央殘基或者堿基是否為作用位點。 對非冗余的蛋白-核酸復(fù)合物數(shù)據(jù)集進(jìn)行數(shù)據(jù)處理,提取結(jié)合序列和作用位點信息,用機(jī)器學(xué)習(xí)的方法初步探索了蛋白同核酸作用的結(jié)合模式。建立了一個反向傳播神經(jīng)網(wǎng)絡(luò),充分利用蛋白同核酸結(jié)合的信息,不斷調(diào)整參數(shù),反復(fù)進(jìn)行訓(xùn)練學(xué)習(xí),對DNA結(jié)合蛋白的結(jié)合殘基進(jìn)行了預(yù)測,發(fā)現(xiàn)局部環(huán)境信息能夠以65.85%的NP較好地預(yù)測給出蛋白的結(jié)合殘基。我們用支持向量機(jī)對結(jié)合堿基進(jìn)行預(yù)測,選擇不同窗寬和參數(shù)對數(shù)據(jù)進(jìn)行訓(xùn)練和預(yù)測,并與神經(jīng)網(wǎng)絡(luò)方法進(jìn)行比較,發(fā)現(xiàn)11個窗口長度的堿基環(huán)境的預(yù)測性能相對較好。使用徑向基核函數(shù),支持向量機(jī)成功預(yù)測為結(jié)合堿基的比例能達(dá)到89.72%,敏感性能達(dá)到66.71%。最后,我們對兩種方法進(jìn)行了比較,發(fā)現(xiàn)支持向量機(jī)預(yù)測結(jié)合堿基較為成功。 本文利用蛋白-核酸復(fù)合物作用對數(shù)據(jù),用機(jī)器學(xué)習(xí)的方法初步探索了蛋白同核酸作用的結(jié)合模式,證明了在轉(zhuǎn)錄相關(guān)的蛋白-核酸復(fù)合物中,局部殘基環(huán)境或者堿基環(huán)境在一定程度上決定了中央殘基或者堿基是否為作用位點。
[Abstract]:Gene is the physical and functional unit of genetic information. The function of gene depends on its structure and expression regulation.The regulation of gene expression is to transform the structure of these genes into a variety of gene function operators.Therefore, the study of gene expression regulation is of great significance to reveal the mystery of life. The transcription initiation regulation of mRNA is the basic control point and the most important part of the regulation, and its essence is that transcription factors bind to the corresponding regulatory elements.It affects the activity of RNA polymerase and the transcription level of gene.In view of the fact that there are few studies on the structure of protein-nucleic acid complexes in the world at present, this paper analyzes the amino acid base interaction pairs and explores the prediction method of transcription factor binding sites from the structure data of protein-nucleic acid complexes.All recorded protein-nucleic acid complexes were identified from the PDB macromolecular structure database.Using the complex force calculation software, the three-dimensional spatial structure data of these protein-nucleic acid complexes were processed, and the possible interaction between amino acid side chain and nucleic acid was obtained.According to the SWISSPROT database, the proteins were divided into complex sets related to the regulation process and complex sets independent of regulation.The interaction between amino acid side chain and nucleic acid in protein-nucleic acid complex was analyzed statistically.By analyzing the local environmental information of the interaction between DNA and the amino acid residues of transcription factors, we found that some triplex or pentagonal residues always bind to DNA, so we suggest that in transcription-related protein-nucleic acid complexes,Amino acid residue environment or base environment determines whether the central residue or base is the action site to some extent.The non-redundant protein-nucleic acid complex data set was processed to extract binding sequences and action site information. The binding pattern of protein to nucleic acid was preliminarily explored by machine learning.A back propagation neural network was established to make full use of the information of protein binding to nucleic acid, to adjust parameters, to train and learn repeatedly, and to predict the binding residues of DNA binding protein.It was found that 65.85% of NP could well predict the binding residues of the protein.We use support vector machine (SVM) to predict the binding base, and select different window width and parameters to train and predict the data. Compared with the neural network method, we find that the prediction performance of 11 window length base environments is relatively good.By using radial basis kernel function, support vector machine can successfully predict that the ratio of binding bases can reach 89.72 and the sensitivity can reach 66.71.Finally, we compare the two methods and find that support vector machine (SVM) combined with base is more successful.In this paper, by using protein-nucleic acid complex interaction data and machine learning method, we have preliminarily explored the binding pattern between protein and nucleic acid, which proves that it is in transcription-related protein-nucleic acid complex.The local residue environment or base environment determines whether the central residue or base is the action site to some extent.
【學(xué)位授予單位】:東南大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2005
【分類號】:R346
【引證文獻(xiàn)】
相關(guān)博士學(xué)位論文 前1條
1 陳歡;大豆籽粒不同發(fā)育時期基因表達(dá)譜的研究[D];吉林農(nóng)業(yè)大學(xué);2012年
,本文編號:1759717
本文鏈接:http://sikaile.net/yixuelunwen/binglixuelunwen/1759717.html
最近更新
教材專著