人類精子非編碼氨基酸多樣性的研究
發(fā)布時間:2018-03-16 06:11
本文選題:蛋白質(zhì)組 切入點:非編碼氨基酸 出處:《山東大學(xué)》2017年碩士論文 論文類型:學(xué)位論文
【摘要】:蛋白質(zhì)通常是由基因組的編碼序列翻譯確定的。然而,因為翻譯后修飾,氨基酸替換等原因,它們的氨基酸殘基很少直接以基因組的方式確定,實際情況下的氨基酸殘基往往會發(fā)生改變,從而改變蛋白結(jié)構(gòu)和影響蛋白功能。但是目前生物體中的氨基酸殘基很少直接以蛋白質(zhì)組學(xué)的方式確定,主要是因為與編碼的氨基酸不同的氨基酸殘基通常會被普通搜索算法忽略,其次是因為蛋白質(zhì)測序技術(shù)通常取決于理論上翻譯的蛋白質(zhì)數(shù)據(jù)庫。然而,通過假設(shè)在肽斷序列中存在一個或多個未定義的非編碼氨基酸殘基,成為解決那些無法匹配肽譜的突破點。在早期的方法中,部分肽段序列來源于不匹配的光譜,可以用作標(biāo)簽來搜索理論上基因組翻譯的蛋白質(zhì)數(shù)據(jù)庫,搜索結(jié)果就會出現(xiàn)意想不到的翻譯后修飾和氨基酸取代。后來用非限制性搜索算法來識別非編碼氨基酸殘基,卻不知道它們是否存在。mass-tolerant方法最初用于通過允許前體與其片段之間的質(zhì)量差異來檢測已知的修飾,近來改進了該方法,通過允許寬泛的mass-tolerant來匹配含有寬范圍質(zhì)量差或未定義的修飾的肽段序列,找到許多修飾。但是這些方法的主要問題仍然是較高的假陽性,較低的靈敏度和漫長的搜索時間。在這里我們系統(tǒng)研究了在人類精子細胞中所有可能的氨基酸殘基,它們的相對分子質(zhì)量不同于在基因組序列中編碼的氨基酸,稱為非編碼氨基酸(ncAA)。通過測量編碼氨基酸和實際蛋白質(zhì)殘基之間的質(zhì)量差,發(fā)現(xiàn)超過一百萬個存在非零質(zhì)量差的氨基酸,即側(cè)鏈發(fā)生改變的氨基酸。然后根據(jù)這些質(zhì)量差做高斯混合分布分析以及迭代回歸分析,從而確定了424種高可信度的聚集高斯簇,通過機器學(xué)習(xí)算法建立決策樹確定了849種高度可信的ncAAs,分布在35,274個蛋白質(zhì)位點上。其中發(fā)現(xiàn)180種質(zhì)量差聚類顯示具有從未報告過的氨基酸側(cè)鏈結(jié)構(gòu);105種ncAAs匹配到氨基酸替換的類型,其中40種通過轉(zhuǎn)錄組測序得以確認。此外,根據(jù)ANOVA分析結(jié)果,發(fā)現(xiàn)有些ncAAs在正常人群中存在特異性分布,暗示著這些ncAAs可能與人群差異性有關(guān)。還有些ncAAs在重度少弱精患者和正常人群中呈差異性分布,暗示著這些ncAAs與患病機理有關(guān),其中有些磷酸化位點已經(jīng)被之前的研究所報道。我們的研究表明ncAAs廣泛存在于精子細胞中,主要是因為核苷酸多態(tài)性,翻譯后修改,以及一些未知的機制,這些對疾病的診斷和藥物靶向治療存在重要意義。
[Abstract]:Proteins are usually determined by the translation of the coding sequence of the genome. However, because of post-translational modification, amino acid substitution and other reasons, their amino acid residues are rarely determined directly by the genome. In practice, amino acid residues often change, thus changing protein structure and affecting protein function. However, at present, amino acid residues in organisms are rarely determined directly by proteomics. Mainly because amino acid residues that are different from the amino acids encoded are often ignored by common search algorithms, followed by protein sequencing techniques that are generally dependent on the protein database that is theoretically translated. By assuming that there are one or more undefined non-coding amino acid residues in the peptide sequence, it becomes a breakthrough point to solve the problem of unmatched peptide spectrum. In the early methods, some of the peptide fragment sequences were derived from mismatched spectra. Can be used as a tag to search a protein database for theoretical genomic translation, resulting in unexpected posttranslational modifications and amino acid substitutions. Then an unconstrained search algorithm is used to identify non-encoded amino acid residues. Not knowing whether they exist or not, the. Mass-tolerant method, which was originally used to detect known modifications by allowing quality differences between precursors and their fragments, has recently been improved. Many modifications are found by allowing broad mass-tolerant to match peptide sequences containing a wide range of poor or undefined modifications. But the main problem with these methods is still high false positivity. Low sensitivity and long search time. Here we systematically studied all the possible amino acid residues in human sperm cells, whose relative molecular weights differ from those encoded in genomic sequences. By measuring the mass difference between the encoded amino acids and the actual protein residues, more than one million amino acids with non-zero mass differences were found. According to these mass differences, Gao Si mixed distribution analysis and iterative regression analysis were made to determine 424 kinds of high reliability aggregating Gao Si clusters. A decision tree of 849 highly trusted ncAAss was established by machine learning algorithm, which was distributed on 35,274 protein sites. Among them, 180 mass difference clusters were found to have unreported amino acid side chain structure (ncAAs) matching. Type of amino acid replacement, Forty of them were identified by transcriptome sequencing. In addition, according to the results of ANOVA analysis, some ncAAs were found to have specific distribution in normal population. This suggests that these ncAAs may be related to population differences, and that some ncAAs are distributed differently in patients with severe oligozoospermia and in normal people, suggesting that these ncAAs may be related to the pathogenesis of the disease. Some of these phosphorylation sites have been reported in previous studies. Our studies have shown that ncAAs is widespread in sperm cells, mainly due to nucleotide polymorphisms, post-translational modifications, and unknown mechanisms. These are important for the diagnosis of disease and drug targeted therapy.
【學(xué)位授予單位】:山東大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:R321.1
【相似文獻】
相關(guān)碩士學(xué)位論文 前2條
1 張晨;UAA編碼氨基酸表達體系的構(gòu)建[D];吉林大學(xué);2017年
2 陳新駿;人類精子非編碼氨基酸多樣性的研究[D];山東大學(xué);2017年
,本文編號:1618651
本文鏈接:http://sikaile.net/yixuelunwen/jichuyixue/1618651.html
最近更新
教材專著