天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 軟件論文 >

基于相似性比對改進(jìn)KNN的蛋白質(zhì)亞細(xì)胞定位預(yù)測研究

發(fā)布時間:2019-05-07 09:03
【摘要】:蛋白質(zhì)的功能與其所處的亞細(xì)胞區(qū)間緊密相關(guān),通過對蛋白質(zhì)的亞細(xì)胞區(qū)間預(yù)測研究能夠幫助我們了解蛋白質(zhì)的功能信息,對于生物研究有重要意義。傳統(tǒng)通過實(shí)驗的方式獲得蛋白質(zhì)亞細(xì)胞區(qū)間信息不僅耗時久、成本高,而且不利于大量蛋白序列的區(qū)間定位,因此需要找到一種高效的蛋白質(zhì)亞細(xì)胞區(qū)間預(yù)測方法。本文中介紹了蛋白序列的特征提取算法并對傳統(tǒng)K最近鄰(k-NearestNeighbor,KNN)分類器進(jìn)行改進(jìn),提出一種基于相似性比對改進(jìn)KNN的蛋白質(zhì)亞細(xì)胞分類預(yù)測算法,通過AdaBoost和Bagging進(jìn)行集成預(yù)測,取得較好的實(shí)驗效果,本文主要工作如下:本文主要介紹了氨基酸組成、二肽、偽氨基酸組成三種特征提取算法;除了公共數(shù)據(jù)集ZD98,CH317,還構(gòu)建了新的數(shù)據(jù)集Gram1253;對傳統(tǒng)KNN分類器進(jìn)行改進(jìn),使用Blast比對尋找最相似序列完成KNN算法的決策,提出一種新的分類預(yù)測算法:相似性比對KNN預(yù)測算法,在三個數(shù)據(jù)集上進(jìn)行Jackknife檢驗,成功率分別為93.9%,91.5%和92.5%;隨后引入Hadoop分布式計算框架對算法進(jìn)行優(yōu)化。為了進(jìn)一步對預(yù)測算法進(jìn)行研究,本文采用AdaBoost和Bagging算法對多個相似性比對KNN分類器進(jìn)行集成后對蛋白序列的亞細(xì)胞區(qū)間進(jìn)行預(yù)測,三個數(shù)據(jù)集在Jackknife檢驗下,AdaBoost的預(yù)測成功率分別為94.9%,92.4%和93.1%。由于ZD98和CH317數(shù)據(jù)集區(qū)間分布不均衡,Bagging集成算法的預(yù)測準(zhǔn)確率低于相似性比對KNN算法,為89.8%和87.7%。但在Gram1253上實(shí)驗效果較好,預(yù)測準(zhǔn)確率達(dá)到92.9%,實(shí)驗結(jié)果表明AdaBoost和Bagging集成分類預(yù)測方法是一種較為有效的蛋白質(zhì)亞細(xì)胞區(qū)間預(yù)測方法。
[Abstract]:The function of protein is closely related to its subcellular interval. The prediction of subcellular interval of protein can help us to understand the functional information of protein, which is of great significance for biological research. The traditional method of obtaining protein subcellular interval information by experiment is not only time-consuming, high-cost, but also unfavorable to the localization of a large number of protein sequences, so it is necessary to find an efficient method of protein subcellular interval prediction. In this paper, the feature extraction algorithm of protein sequence is introduced, and the traditional K nearest neighbor classifier is improved. A novel protein subcellular classification prediction algorithm based on similarity ratio based on improved KNN is proposed. Through AdaBoost and Bagging integrated prediction, good experimental results have been obtained. The main work of this paper is as follows: this paper mainly introduces three feature extraction algorithms: amino acid composition, dipeptide, pseudo amino acid composition; In addition to the common dataset ZD98,CH317, a new dataset Gram1253; has been built The traditional KNN classifier is improved, and the decision of KNN algorithm is completed by using Blast comparison to find the most similar sequence. A new classification and prediction algorithm is proposed: similarity ratio KNN prediction algorithm, and Jackknife test is performed on three data sets. The success rates were 93.9%, 91.5% and 92.5%, respectively. Then the Hadoop distributed computing framework is introduced to optimize the algorithm. In order to further study the prediction algorithm, the AdaBoost and Bagging algorithms are used to predict the subcellular interval of the protein sequence after integrating the KNN classifier with multiple similarity ratios. The three data sets are tested by Jackknife. The predictive success rates of AdaBoost were 94.9%, 92.4% and 93.1%, respectively. Because of the uneven interval distribution between ZD98 and CH317 data sets, the prediction accuracy of Bagging integration algorithm is lower than that of KNN algorithm, which is 89.8% and 87.7% respectively. However, the experimental results on Gram1253 show that the prediction accuracy is 92.9%. The experimental results show that AdaBoost and Bagging integrated classification prediction method is an effective method for protein subcellular interval prediction.
【學(xué)位授予單位】:南京農(nóng)業(yè)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2016
【分類號】:Q51;TP301.6

【參考文獻(xiàn)】

相關(guān)期刊論文 前2條

1 文學(xué)志;方巍;鄭鈺輝;;一種基于類Haar特征和改進(jìn)AdaBoost分類器的車輛識別算法[J];電子學(xué)報;2011年05期

2 李利珍;董自梅;;基于整合蛋白質(zhì)進(jìn)化保守性的偽氨基酸組成成分預(yù)測蛋白質(zhì)亞細(xì)胞定位(英文)[J];生物物理學(xué)報;2009年02期

相關(guān)博士學(xué)位論文 前1條

1 高青斌;蛋白質(zhì)亞細(xì)胞定位預(yù)測相關(guān)問題研究[D];國防科學(xué)技術(shù)大學(xué);2006年

相關(guān)碩士學(xué)位論文 前1條

1 陳愛平;基于Hadoop的聚類算法并行化分析及應(yīng)用研究[D];電子科技大學(xué);2012年

,

本文編號:2470952

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2470952.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶c5bb3***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com
成人三级视频在线观看不卡| 亚洲午夜av久久久精品| 在线日韩中文字幕一区| 亚洲视频在线观看免费中文字幕| 九九热这里只有精品视频| 欧美日韩精品综合在线| 嫩草国产福利视频一区二区| 免费在线播放不卡视频| 日韩欧美一区二区不卡视频| 亚洲欧美天堂精品在线| 欧美精品亚洲精品日韩精品| 熟女中文字幕一区二区三区| 国产欧美日韩在线精品一二区 | 情一色一区二区三区四| 国产欧美日韩不卡在线视频| 精品偷拍一区二区三区| 99视频精品免费视频播放| 夫妻性生活动态图视频| 国产精品欧美一级免费| 麻豆在线观看一区二区| 亚洲另类女同一二三区| 美女黄片大全在线观看| 国产成人亚洲精品青草天美| 久久精视频免费视频观看| 少妇特黄av一区二区三区| 日韩国产精品激情一区| 欧美一区二区三区不卡高清视| 欧美成人高清在线播放| 国产一区欧美午夜福利| 91偷拍与自偷拍精品| 日韩精品人妻少妇一区二区| 国产不卡视频一区在线| 亚洲精品中文字幕欧美| 好吊妞视频只有这里有精品| 亚洲精品国产福利在线| 欧美国产日韩变态另类在线看| 午夜国产精品国自产拍av| 99久久国产精品亚洲| 国产精品亚洲欧美一区麻豆| 性感少妇无套内射在线视频| 欧美一区二区三区播放|