天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當前位置:主頁 > 科技論文 > 自動化論文 >

基于特征融合和降維算法的蛋白質(zhì)亞核定位研究

發(fā)布時間:2018-07-03 10:55

  本文選題:蛋白質(zhì)亞核定位 + 融合表達; 參考:《云南大學》2016年碩士論文


【摘要】:隨著人類基因組測序的完成,高通量測序技術(shù)逐步流行,使得蛋白質(zhì)序列大量產(chǎn)生。對新測得序列的蛋白質(zhì)功能的掌握則成為生物信息學研究的熱點之一。眾所周知,蛋白質(zhì)需要在生物體細胞內(nèi)執(zhí)行其生物活動,進而得知蛋白質(zhì)的亞細胞、亞核定位信息與蛋白質(zhì)的功能緊密相關(guān),并且蛋白質(zhì)亞核定位信息還為遺傳和癌癥等方面疾病的預防、診斷與治療提供有效的線索。然而傳統(tǒng)的通過生物學實驗的方法獲取蛋白質(zhì)亞核定位信息需消耗大量的時間與金錢。近年來,隨著計算機科學快速地發(fā)展,利用機器學習的方法研究蛋白質(zhì)亞核定位成為生物信息學研究的一個熱點,并且基于機器學習的方法所開發(fā)出的定位方法預測速度快且代價較低。本文正是利用機器學習的方法對蛋白質(zhì)亞核定位問題展開深入研究。首先全面地對蛋白質(zhì)亞核定位的基本知識、問題的背景與意義以及研究現(xiàn)狀進行闡述;同時對蛋白質(zhì)亞核定位的主要研究內(nèi)容給出詳細地描述;然后不同角度地對蛋白質(zhì)序列特征表達和分類器的選擇進行探討,并歸結(jié)了當前蛋白質(zhì)序列表達方法存有的問題;最后提出了本文研究蛋白質(zhì)亞核定位的突破點。提出基于特征融合和有監(jiān)督的局部保持投影的蛋白質(zhì)亞核定位方法。由于傳統(tǒng)的特征表達只局限于單一方面序列信息來提取蛋白質(zhì)特征,并且基于傳統(tǒng)的特征表達,設(shè)計分類模型時,沒有分析序列表達的數(shù)據(jù)分布,使得特征表達與分類方法之間比較孤立,于是,該方法首先對具有序列互補性信息的表達進行融合,得到一種具有高效判別信息的特征融合表達;然后利用有監(jiān)督的局部保持投影學習數(shù)據(jù)低維流形,對提出的融合表達降維處理,得到類間分割、類內(nèi)保持的低維判別特征,依據(jù)此數(shù)據(jù)分布,選用K-近鄰分類方法預測序列的亞核位置;最后該方法在兩種標準數(shù)據(jù)集上進行多種對比實驗均取得較高的預測精度。該方法充分利用傳統(tǒng)序列表達包含信息的互補性,并考慮序列表達的數(shù)據(jù)分布與分類模型的關(guān)聯(lián)性,使得該方法在整體預測精度上有較大的提高。但是該方法忽略了不同亞核位置蛋白質(zhì)的差異性,為此提出了本文研究的另一創(chuàng)新點。提出基于高效的融合表達和線性判別分析的蛋白質(zhì)亞核定位方法。該方法依據(jù)不同特征表達包含的序列信息不同,進而對亞核定位的貢獻程度不同,以及不同亞核位置上的蛋白質(zhì)的功能不同的性質(zhì),通過精細化各亞核位置上蛋白質(zhì)的這些差異性,提出對不同亞核位置上的特征數(shù)據(jù)進行不同程度的融合處理,構(gòu)建出包含高效判別信息的兩種高維融合表達;其中,利用遺傳算法求取融合表達的各亞核位置上的特征融合系數(shù)。由于得到的融合表達的維度高且融合表達包含的信息有冗余,為此,利用線性判別分析降維處理所提出的融合表達,選出亞核定位預測精度最高時的數(shù)據(jù)維度,同時開發(fā)出本章的蛋白質(zhì)亞核定位分類器。在兩種標準數(shù)據(jù)集上運行大量實驗,結(jié)果表明提出的方法具有較高的預測精度,且分類器的性能也較高。
[Abstract]:With the completion of the sequencing of the human genome, high throughput sequencing technology is becoming popular, making a large number of protein sequences. It is one of the hotspots in the study of bioinformatics to master the protein function of the newly detected sequences. It is well known that proteins need to hold their biological activities within the cells of the organism, and then learn the subthin protein of the protein. The localization information of subnuclei is closely related to the function of protein, and the localization information of protein subnuclei provides effective clues for the prevention and treatment of diseases such as heredity and cancer. However, the traditional method of obtaining protein subnuclear location through biological experiments takes a lot of time and money. With the rapid development of computer science, using machine learning method to study the localization of protein subnuclei has become a hot spot in bioinformatics research, and the positioning method developed based on machine learning method has a fast and low cost. This paper is using the method of machine learning to develop the problem of protein subcore positioning. Firstly, the basic knowledge of protein subnucleus localization, the background and significance of the problem and the current research status are expounded, and the main contents of the protein subnucleus location are described in detail. Then the expression of protein sequence characteristics and the selection of classifier are discussed in different angles, and the results are summed up. At the end of this paper, the breakthrough point of protein subcore localization is proposed in this paper. A protein subnucleus localization method based on feature fusion and supervised local maintenance is proposed. In the traditional feature expression, when the classification model is designed, the data distribution is not analyzed, which makes the feature expression and the classification method more isolated. Therefore, the method first combines the expression of the sequence complementarity information, and obtains a feature fusion expression with efficient discriminant information; then, the method is supervised. The local preserving projection learning data is low dimensional manifold, and the proposed fusion expression reduction processing, the inter class segmentation, the low dimension distinguishing feature of the class keep in class, according to this data distribution, the K- nearest neighbor classification method is selected to predict the subkernel position of the sequence. Finally, the method has achieved a higher preview in a variety of contrast experiments on the two standard data sets. This method makes full use of the complementarity of the information contained in the traditional sequence expression, and takes into account the correlation between the data distribution and the classification model expressed in the sequence, making the method more accurate in the overall prediction accuracy. However, this method ignores the difference of different subkernel position proteins, and puts forward another creation in this paper. New points. A protein subnucleus localization method based on high efficient fusion expression and linear discriminant analysis is proposed. This method is based on different features of sequence information contained in different features, and then the contribution degree to subnuclear localization is different, as well as the different functional properties of protein in different subnuclei, by fine refining the protein subnucleus location protein. The quality of these differences, proposed to different subkernel location of the feature data in different degrees of fusion processing, and construct two kinds of high dimensional fusion expression including high efficient discriminant information; in which the genetic algorithm is used to obtain the fusion coefficients of the subkernel location of the fusion expression. The information contained in the expression is redundant. Therefore, the fusion expression proposed by the linear discriminant analysis is used to select the data dimension when the subkernel location prediction is the highest, and the protein subkernel location classifier is developed in this chapter. The large quantity experiment is run on the two standard data sets, and the results show that the proposed method is high. The precision is predicted, and the performance of the classifier is also high.
【學位授予單位】:云南大學
【學位級別】:碩士
【學位授予年份】:2016
【分類號】:Q51;TP181


本文編號:2093403

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/2093403.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶c1a93***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com