基于機(jī)器學(xué)習(xí)的蛋白質(zhì)結(jié)構(gòu)類預(yù)測(cè)與質(zhì)量評(píng)估
本文選題:蛋白質(zhì)結(jié)構(gòu)類 + SVM��; 參考:《河南師范大學(xué)》2017年碩士論文
【摘要】:蛋白質(zhì)是構(gòu)成單個(gè)細(xì)胞的基本有機(jī)物,是生命活動(dòng)的執(zhí)行者,其角色決定于它的功能,而蛋白質(zhì)功能主要由他們的結(jié)構(gòu)決定,因此研究蛋白質(zhì)的結(jié)構(gòu)對(duì)于認(rèn)識(shí)其功能具有很大的意義。但由于在生物體內(nèi)蛋白質(zhì)的組成復(fù)雜多樣,直接使用分子動(dòng)力學(xué)技術(shù)模擬蛋白質(zhì)折疊過程,不僅需要大量的計(jì)算資源,還需要對(duì)蛋白質(zhì)折疊過程有深刻的認(rèn)識(shí),很難快速準(zhǔn)確的實(shí)現(xiàn)結(jié)構(gòu)預(yù)測(cè)和模型質(zhì)量評(píng)估。隨著計(jì)算機(jī)信息技術(shù)的發(fā)展,研究基于機(jī)器學(xué)習(xí)(Machine Learning,ML)的蛋白質(zhì)結(jié)構(gòu)類預(yù)測(cè)和質(zhì)量評(píng)估是目前生物信息領(lǐng)域的一個(gè)研究熱點(diǎn)。本論文的主要研究?jī)?nèi)容包括以下三個(gè)方面:(1)構(gòu)筑基于屬性約減的蛋白質(zhì)結(jié)構(gòu)類多分類模型。在蛋白質(zhì)結(jié)構(gòu)類分類預(yù)測(cè)中,首先對(duì)于已知氨基酸序列的蛋白質(zhì),選擇不易丟失序列信息的偽氨基酸特征,然后針對(duì)蛋白質(zhì)序列特征表達(dá)存在信息冗余,考慮到結(jié)構(gòu)類分類是個(gè)多分類問題,提出利用Relief F算法對(duì)蛋白質(zhì)結(jié)構(gòu)特征進(jìn)行約減,接著采用多個(gè)二分類的SVM模型來構(gòu)造SVM多分類器模型,最后對(duì)蛋白質(zhì)結(jié)構(gòu)類進(jìn)行分類,盡管實(shí)驗(yàn)結(jié)果和未進(jìn)行特征約簡(jiǎn)的方法相比,耗費(fèi)的時(shí)間減少近一半,但存在模型參數(shù)不好確定的問題。(2)設(shè)計(jì)SAPSO算法,優(yōu)化蛋白質(zhì)結(jié)構(gòu)類分類模型參數(shù)。針對(duì)上述蛋白質(zhì)結(jié)構(gòu)類多分類模型參數(shù)不好確定的問題,綜合模擬退火(Simulated Annealing,SA)算法跳出局部最優(yōu)解和粒子群(Particle Swarm optimization,PSO)算法收斂速度快的特點(diǎn),設(shè)計(jì)出一種適合蛋白質(zhì)分類模型的模擬退火粒子群(SAPSO)算法以獲取優(yōu)化的模型參數(shù),然后通過具體的蛋白質(zhì)分類實(shí)驗(yàn),證明設(shè)計(jì)方法的有效性。(3)針對(duì)傳統(tǒng)蛋白質(zhì)模型質(zhì)量評(píng)估沒有考慮同源信息問題的缺陷,建立了一種基于ML的蛋白質(zhì)模型質(zhì)量評(píng)估模型。將蛋白質(zhì)序列輸入到SWISS-MODEL中,自動(dòng)構(gòu)造出它的三維結(jié)構(gòu)。將蛋白質(zhì)序列和Model1序列輸入到BLAST系統(tǒng)中,提取序列比對(duì)的四個(gè)主要特征。在考慮同源信息的情況下,將提取的特征值作為L(zhǎng)S-SVM的輸入數(shù)據(jù)用來訓(xùn)練LS-SVM,并同時(shí)利用SAPSO算法對(duì)LS-SVM的參數(shù)尋優(yōu)。由最優(yōu)參數(shù)值構(gòu)造的LS-SVM模型來得到蛋白質(zhì)GDT-TS。然后通過測(cè)試實(shí)驗(yàn)表明該設(shè)計(jì)模型在絕對(duì)誤差和均方誤差方面均有明顯優(yōu)勢(shì),進(jìn)而證明所建模型的合理性和有效性。
[Abstract]:Protein is the basic organic substance that makes up a single cell. It is the executor of life activity. Its role is determined by its function, and the function of protein is mainly determined by their structure. Therefore, it is of great significance to study the structure of proteins for understanding their functions. However, due to the complexity and diversity of protein composition in organisms, direct use of molecular dynamics technology to simulate protein folding process requires not only a large number of computational resources, but also a profound understanding of protein folding process. It is difficult to realize structure prediction and model quality evaluation quickly and accurately. With the development of computer information technology, the research of protein structure class prediction and quality evaluation based on machine learning (ML) is a hot topic in the field of biological information. The main contents of this thesis include the following three aspects: 1) to construct a multi-classification model of protein structure based on attribute reduction. In the classification and prediction of protein structural classes, the pseudo amino acid features of the known amino acid sequences are selected for the known amino acid sequences, and then there is information redundancy for the protein sequence feature expression. Considering that structural class classification is a multi-classification problem, Relief F algorithm is proposed to reduce the structural features of proteins. Then, the multi-classifier model of SVM is constructed by using a number of two-classification SVM models. Finally, the protein structural classes are classified. Although the experimental results are less than half of the time consumed by the method without feature reduction, there is a problem that the model parameters are difficult to determine. (2) the SAPSO algorithm is designed to optimize the parameters of the protein structure class classification model. In view of the difficulty of determining the parameters of the multi-classification model of protein structure, synthetic simulated annealing (SA) algorithm can jump out of the local optimal solution and the particle swarm optimization (PSO) algorithm converges quickly. A simulated annealing particle swarm optimization (SAPSO) algorithm suitable for protein classification model was designed to obtain the optimized model parameters. It is proved that the design method is effective. (3) aiming at the defect that the traditional protein model quality evaluation does not consider the problem of homology information, a protein model quality evaluation model based on ML is established. The protein sequence is input into SWISS-MODEL and its three-dimensional structure is constructed automatically. Protein sequences and Model1 sequences were input into the BLAST system to extract the four main features of sequence alignment. When the homologous information is considered, the extracted eigenvalues are used as input data of LS-SVM to train LS-SVM, and SAPSO algorithm is used to optimize the parameters of LS-SVM. The protein GDT-TSs were obtained from the LS-SVM model constructed from the optimal parameter values. Then the test results show that the design model has obvious advantages in absolute error and mean square error, which proves the rationality and validity of the model.
【學(xué)位授予單位】:河南師范大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:Q51;TP181
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 薛章鷹;劉興權(quán);;結(jié)合ReliefF、GA和SVM的面向?qū)ο蠼ㄖ锬繕?biāo)識(shí)別特征選擇方法[J];測(cè)繪工程;2017年02期
2 王鮮芳;張悅;王俊美;;基于SAPSO-LSSVM的蛋白質(zhì)模型質(zhì)量評(píng)估[J];計(jì)算機(jī)應(yīng)用研究;2017年05期
3 余曉東;雷英杰;岳韶華;王睿;;基于粒子群優(yōu)化的直覺模糊核聚類算法研究[J];通信學(xué)報(bào);2015年05期
4 余麗;陸鋒;楊林;;交通網(wǎng)絡(luò)旅行商路徑優(yōu)化的遺傳禁忌搜索算法[J];測(cè)繪學(xué)報(bào);2014年11期
5 李佳;劉天琪;李興源;邢大鵬;李茜;江東林;肖軍;;改進(jìn)粒子群-禁忌搜索算法在多目標(biāo)無功優(yōu)化中的應(yīng)用[J];電力自動(dòng)化設(shè)備;2014年08期
6 袁澎;艾芊;趙媛媛;;基于改進(jìn)的遺傳 模擬退火算法和誤差度分析原理的PMU多目標(biāo)優(yōu)化配置[J];中國電機(jī)工程學(xué)報(bào);2014年13期
7 唐勇波;桂衛(wèi)華;彭濤;歐陽偉;;PCA和KICA特征提取的變壓器故障診斷模型[J];高電壓技術(shù);2014年02期
8 張旭輝;林海軍;劉明珠;高豹江;;基于蟻群粒子群優(yōu)化的卡爾曼濾波算法模型參數(shù)辨識(shí)[J];電力系統(tǒng)自動(dòng)化;2014年04期
9 蘇盈盈;劉興華;葛繼科;李太福;文峰;;基于Relief+SVM的語音信號(hào)特征提取及其識(shí)別[J];重慶科技學(xué)院學(xué)報(bào)(自然科學(xué)版);2013年05期
10 張志鋒;范乃梅;;極限學(xué)習(xí)機(jī)優(yōu)化方法在蛋白質(zhì)折疊類型識(shí)別中的應(yīng)用[J];科學(xué)技術(shù)與工程;2013年11期
相關(guān)博士學(xué)位論文 前1條
1 王鮮芳;生化過程動(dòng)態(tài)建模及優(yōu)化控制研究[D];江南大學(xué);2009年
相關(guān)碩士學(xué)位論文 前8條
1 鮑文正;基于多分類器集成的蛋白質(zhì)三級(jí)結(jié)構(gòu)預(yù)測(cè)[D];濟(jì)南大學(xué);2015年
2 李娟娟;基于多特征融合和集成的蛋白質(zhì)相互作用預(yù)測(cè)[D];濟(jì)南大學(xué);2014年
3 鄭斌;多分類機(jī)器學(xué)習(xí)及其在蛋白質(zhì)結(jié)構(gòu)類預(yù)測(cè)中的應(yīng)用[D];杭州電子科技大學(xué);2014年
4 彭菲;PCA和KPCA自融合的MSTAR SAR自動(dòng)目標(biāo)識(shí)別算法研究[D];大連理工大學(xué);2013年
5 王若飛;基于機(jī)器學(xué)習(xí)的蛋白質(zhì)折疊預(yù)測(cè)算法研究[D];湘潭大學(xué);2010年
6 蔡娜娜;基于計(jì)算智能的蛋白質(zhì)三級(jí)結(jié)構(gòu)預(yù)測(cè)[D];濟(jì)南大學(xué);2010年
7 姜百寧;機(jī)器學(xué)習(xí)中的特征選擇算法研究[D];中國海洋大學(xué);2009年
8 俞文洋;支持向量機(jī)在蛋白質(zhì)結(jié)構(gòu)預(yù)測(cè)中的應(yīng)用研究[D];河南大學(xué);2008年
,本文編號(hào):1889747
本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/1889747.html