蛋白質(zhì)組表達(dá)譜基本生物信息學(xué)研究及全蛋白質(zhì)組等電點(diǎn)分布研究
發(fā)布時(shí)間:2019-03-30 21:21
【摘要】:人類基因組研究為我們提供了人類基因組圖譜,但基因組圖譜所能提供的基因組內(nèi)蘊(yùn)含的功能信息非常有限。為了系統(tǒng)闡釋其編碼基因的功能,以及其間的相互關(guān)系等,蛋白質(zhì)組研究同益受到關(guān)注。 蛋白質(zhì)組研究最根本的目標(biāo)是建立生物體組織或器官的蛋白質(zhì)組表達(dá)譜,以及隨后系統(tǒng)地闡明表達(dá)譜的生物學(xué)意義。至今,蛋白質(zhì)組的多種鑒定儀器已經(jīng)能以較高的通量進(jìn)行蛋白質(zhì)鑒定,這些技術(shù)為大規(guī)模的蛋白質(zhì)組表達(dá)譜研究奠定了基礎(chǔ)。然而,與高速發(fā)展的儀器相比較,適合于大規(guī)模蛋白質(zhì)組表達(dá)譜研究的數(shù)據(jù)處理、整合及分析方法明顯滯后。雖然個(gè)別表達(dá)譜研究也建立了部分?jǐn)?shù)據(jù)處理、整合的方法,但至今仍然沒有一個(gè)系統(tǒng)、全面的數(shù)據(jù)處理、整合體系。蛋白質(zhì)鑒定的可靠性依然是蛋白質(zhì)質(zhì)譜鑒定的難點(diǎn);而且,在質(zhì)譜產(chǎn)出數(shù)據(jù)和最終表達(dá)譜的系統(tǒng)生物學(xué)意義分析之間仍然存在著難以跨越的鴻溝。 為進(jìn)一步提高蛋白質(zhì)鑒定的可靠性,并填補(bǔ)蛋白質(zhì)質(zhì)譜鑒定和表達(dá)譜分析之間的鴻溝,為最終鑒定蛋白質(zhì)的生物學(xué)分析提供便利,我們?cè)趯?duì)現(xiàn)有研究的充分調(diào)研以及對(duì)蛋白質(zhì)組表達(dá)譜需求詳細(xì)分析的基礎(chǔ)上,建立了多種策略以加強(qiáng)鑒定結(jié)果的可靠性,并為隨后的生物學(xué)分析提供了鑒定蛋白質(zhì)的基本注釋信息。 對(duì)于基因組序列已知的生物,數(shù)據(jù)庫搜索策略是最經(jīng)濟(jì)、最有效的蛋白質(zhì)鑒定療法。為獲得盡可能多高可靠性的鑒定結(jié)果,我們采用了分步搜索的策略:首先通過對(duì)一個(gè)質(zhì)量相對(duì)較高、覆蓋率較大的非冗余數(shù)據(jù)庫的搜索完成基本鑒定;其次,為了充分利用質(zhì)譜數(shù)據(jù),我們建立了用其它數(shù)據(jù)庫(包括蛋白質(zhì)數(shù)據(jù)庫和核酸數(shù)據(jù)庫)進(jìn)行分步搜索的策略,完成了質(zhì)譜數(shù)據(jù)的補(bǔ)充鑒定和新蛋白質(zhì)挖掘。 出于數(shù)據(jù)庫搜索屬于一種模式匹配的策略,所用的質(zhì)譜鑒定結(jié)果往往不是特別精確,而且數(shù)據(jù)庫中也存在大量的相似蛋白質(zhì)或肽段,有時(shí)一個(gè)質(zhì)譜鑒定結(jié)果可能匹配不止一個(gè)蛋白質(zhì)或肽段。為了充分利用這些質(zhì)譜數(shù)據(jù),并準(zhǔn)確地描述肽段和蛋白質(zhì)鑒定的不精確性,我們建立了肽段和蛋白質(zhì)鑒定的Group模型。 對(duì)于肽質(zhì)最指紋譜的結(jié)果,針對(duì)其數(shù)據(jù)的特殊性,首先利用統(tǒng)計(jì)方法獲得樣
[Abstract]:The study of human genome provides us with the map of human genome, but the functional information contained in genome map is very limited. In order to systematically explain the function of the coding genes and the relationship between them, proteome research has attracted more and more attention. The most fundamental goal of proteome research is to establish proteome expression profiles of tissues or organs of organisms, and then systematically clarify the biological significance of the expression profiles. Up to now, a variety of proteome identification instruments have been able to identify proteins with high throughput, and these techniques have laid the foundation for large-scale proteome expression profile research. However, compared with the rapid development of the instrument, the data processing, integration and analysis methods suitable for large-scale proteome expression profiling are obviously lagging behind. Although some methods of data processing and integration have been established in the study of individual expression profiles, there is still no systematic and comprehensive data processing and integration system. The reliability of protein identification is still a difficulty in the identification of proteins by mass spectrometry, and there is still a gap between the mass spectrometry data and the system biological significance analysis of the final expression profile. In order to further improve the reliability of protein identification, and to fill the gap between protein mass spectrometry and expression spectrum analysis, and to facilitate the final identification of protein biological analysis, Based on the thorough investigation of the existing research and the detailed analysis of the proteome expression profile, we set up a variety of strategies to enhance the reliability of the identification results. The basic annotation information for the identification of proteins was provided for the subsequent biological analysis. Database search strategy is the most economical and effective protein identification therapy for organisms with known genome sequences. In order to obtain as many high reliability identification results as possible, we adopt a step-by-step search strategy: firstly, we complete the basic identification by searching a relatively high quality, high coverage non-redundant database; Secondly, in order to make full use of mass spectrometry data, we set up a step-by-step search strategy using other databases, including protein database and nucleic acid database, and completed the supplementary identification of mass spectrometry data and new protein mining. Since database search is a pattern-matching strategy, the mass spectrum identification results used are often not particularly accurate, and there are a large number of similar protein or peptide segments in the database. Sometimes a mass spectrometry result may match more than one protein or peptide. In order to make full use of these mass spectra data and accurately describe the imprecision of identification of peptide segments and proteins, we established a Group model for identification of peptides and proteins. For the results of peptide most fingerprinting, according to the particularity of its data, the sample was obtained by statistical method.
【學(xué)位授予單位】:中國(guó)人民解放軍軍事醫(yī)學(xué)科學(xué)院
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2005
【分類號(hào)】:Q51
[Abstract]:The study of human genome provides us with the map of human genome, but the functional information contained in genome map is very limited. In order to systematically explain the function of the coding genes and the relationship between them, proteome research has attracted more and more attention. The most fundamental goal of proteome research is to establish proteome expression profiles of tissues or organs of organisms, and then systematically clarify the biological significance of the expression profiles. Up to now, a variety of proteome identification instruments have been able to identify proteins with high throughput, and these techniques have laid the foundation for large-scale proteome expression profile research. However, compared with the rapid development of the instrument, the data processing, integration and analysis methods suitable for large-scale proteome expression profiling are obviously lagging behind. Although some methods of data processing and integration have been established in the study of individual expression profiles, there is still no systematic and comprehensive data processing and integration system. The reliability of protein identification is still a difficulty in the identification of proteins by mass spectrometry, and there is still a gap between the mass spectrometry data and the system biological significance analysis of the final expression profile. In order to further improve the reliability of protein identification, and to fill the gap between protein mass spectrometry and expression spectrum analysis, and to facilitate the final identification of protein biological analysis, Based on the thorough investigation of the existing research and the detailed analysis of the proteome expression profile, we set up a variety of strategies to enhance the reliability of the identification results. The basic annotation information for the identification of proteins was provided for the subsequent biological analysis. Database search strategy is the most economical and effective protein identification therapy for organisms with known genome sequences. In order to obtain as many high reliability identification results as possible, we adopt a step-by-step search strategy: firstly, we complete the basic identification by searching a relatively high quality, high coverage non-redundant database; Secondly, in order to make full use of mass spectrometry data, we set up a step-by-step search strategy using other databases, including protein database and nucleic acid database, and completed the supplementary identification of mass spectrometry data and new protein mining. Since database search is a pattern-matching strategy, the mass spectrum identification results used are often not particularly accurate, and there are a large number of similar protein or peptide segments in the database. Sometimes a mass spectrometry result may match more than one protein or peptide. In order to make full use of these mass spectra data and accurately describe the imprecision of identification of peptide segments and proteins, we established a Group model for identification of peptides and proteins. For the results of peptide most fingerprinting, according to the particularity of its data, the sample was obtained by statistical method.
【學(xué)位授予單位】:中國(guó)人民解放軍軍事醫(yī)學(xué)科學(xué)院
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2005
【分類號(hào)】:Q51
【引證文獻(xiàn)】
相關(guān)期刊論文 前1條
1 朱永生;蔡秋華;羅曦;王穎Y,
本文編號(hào):2450499
本文鏈接:http://sikaile.net/yixuelunwen/binglixuelunwen/2450499.html
最近更新
教材專著