基因組序列k-mer頻次分析及核小體結合模體的理論預測和驗證
發(fā)布時間:2018-08-30 17:57
【摘要】:基因組序列k-mer出現(xiàn)的頻次存在進化分離現(xiàn)象。基于這一現(xiàn)象,我們分析了酵母基因組核小體核心序列與核小體連接序列中k-mer (k≤8)使用頻次的差異。分析了人類1號染色體基因間序列8-mer使用頻次的三峰分布及在XY二核苷分類下的分布特征,給出了理論預測的核小體結合模體集合,并與核小體占據(jù)率實驗結果進行了比較。具體內容如下;贐rogaard等人在實驗上給出的酵母基因組序列上單堿基精度的核小體定位標注,獲得全部的核小體中心序列和核小體連接序列。分析了k-mer(k取4、5、6和8)在兩類序列中相對使用頻率(RF)的差異,發(fā)現(xiàn)當k≥6時,少數(shù)高頻k-mer使用差異明顯。引入兩類序列k-mer相對使用頻率比的對數(shù)(LRF)參數(shù)值,并按照該值增序的方式排列模體,結果顯示模體長度越長兩類序列的使用差異越明顯,當k7以后差異分布逐漸穩(wěn)定。按照核心序列8-mer相對使用頻率增序的方式排列模體,發(fā)現(xiàn)在相對使用頻率小于0.5的區(qū)域,兩類序列的8-mer使用差異更加顯著。分別計算了7個抽樣點附近核心序列偏好的8-mer和連接序列偏好8-mer的G+C含量和二核苷含量。結果顯示當8-mer相對頻率逐步減小時,對應模體的G+C含量逐步增大,連接序列偏好GG和CC二核苷的使用,核心序列明顯偏好CG和GC二核苷的使用?傊,除了少數(shù)極偏好的模體外,兩類序列k-mer使用的差異多數(shù)出現(xiàn)在k-mer相對頻率很低的模體上,這些模體具有較高的G+C含量。核小體結合模體集合的理論預測對于全面了解核小體的定位和染色質重塑以及DNA序列的結構和進化具有重要的意義。為了解釋人類基因組序列8-mer相對模體數(shù)隨頻次的分布的三峰現(xiàn)象。將8-mer集合按照8-mer中包含CG二核苷的含量分類,發(fā)現(xiàn)三個8-mer子集(OCG,1CG和2CG)各自形成獨立的單峰分布,而依照其它15類二核苷分類則沒有此現(xiàn)象,總體8-mer的三個峰正是這三個CG 8-mer子集分布的疊加。分析了DNA序列中8-mer使用的這一獨特的性質,結合對核小體結合序列的實驗研究結論,我們提出了1CG模體集合就是核小體結合模體的理論猜想。為了驗證我們的猜想,計算了1CG 8-mer集合中偏好和稀有的三核苷相對頻率,分別構建了核小體特征參數(shù)Ktri(O)和Ktri(R),得到它們在1177個基因轉錄起始序列(TSS)上的分布,然后與實驗給出的核小體占據(jù)率分布比較。線性擬合的統(tǒng)計結果表明,置信度大于95%的序列占到總數(shù)的89.2%,置信度大于99%的序列占到總數(shù)的81.6%。比較的結果印證了1CG模體集合就是核小體結合模體的理論猜想。
[Abstract]:Based on the phenomena of evolutionary separation of k-mer frequencies in genomic sequences, we analyzed the differences of k-mer frequencies between nucleosome core sequences and nucleosome junction sequences of yeast genome, and analyzed the trimodal distribution of 8-mer frequencies in the intergenic sequences of human chromosome 1 and the fractionation under XY dinucleotide classification. The theoretical predicted nucleosome binding motif set is given and compared with the experimental results of nucleosome occupancy rate. The specific contents are as follows. Based on the precise nucleosome localization labeling on the yeast genome sequence given by Brogaard et al., all nucleosome center sequences and nucleosome junction sequences are obtained. The difference of relative use frequency (RF) of k-mer (k 4,5,6 and 8) in two types of sequences was analyzed. It was found that when k (> 6), a few high frequency k-mers were used differently. The more obvious the difference was, the more stable the difference was after k7. The 8-mer of core sequence and the G of connecting sequence were calculated respectively in the region where the relative use frequency was less than 0.5. The results showed that when the relative frequency of 8-mer gradually decreased, the G+C content of corresponding motifs gradually increased. The use of GG and C C dinucleosides was preferred by the connecting sequence, and the use of CG and GC dinucleosides was obviously preferred by the core sequence. The theoretical prediction of nucleosome-binding motif sets is of great significance for the overall understanding of nucleosome localization and chromatin remodeling, as well as the structure and evolution of DNA sequences. The 8-mer set was classified according to the content of CG-dinucleoside in 8-mer. It was found that three 8-mer subsets (OCG, 1CG and 2CG) formed independent unimodal distributions, which were not found in the other 15 types of dinucleosides. The three peaks of 8-mer were the superposition of the three CG-8-mer subsets. In order to verify our conjecture, the relative frequencies of preference and rare trinucleotides in the 1CG 8-mer set are calculated, and the characteristic parameters Ktri (O) and Ktri (R) of nucleosomes are constructed respectively. The results of linear fitting showed that the sequences with confidence greater than 95% accounted for 89.2% of the total, and those with confidence greater than 99% accounted for 81.6% of the total. The theoretical conjecture of the body combined with the phantom.
【學位授予單位】:內蒙古大學
【學位級別】:博士
【學位授予年份】:2016
【分類號】:Q343.23
本文編號:2213844
[Abstract]:Based on the phenomena of evolutionary separation of k-mer frequencies in genomic sequences, we analyzed the differences of k-mer frequencies between nucleosome core sequences and nucleosome junction sequences of yeast genome, and analyzed the trimodal distribution of 8-mer frequencies in the intergenic sequences of human chromosome 1 and the fractionation under XY dinucleotide classification. The theoretical predicted nucleosome binding motif set is given and compared with the experimental results of nucleosome occupancy rate. The specific contents are as follows. Based on the precise nucleosome localization labeling on the yeast genome sequence given by Brogaard et al., all nucleosome center sequences and nucleosome junction sequences are obtained. The difference of relative use frequency (RF) of k-mer (k 4,5,6 and 8) in two types of sequences was analyzed. It was found that when k (> 6), a few high frequency k-mers were used differently. The more obvious the difference was, the more stable the difference was after k7. The 8-mer of core sequence and the G of connecting sequence were calculated respectively in the region where the relative use frequency was less than 0.5. The results showed that when the relative frequency of 8-mer gradually decreased, the G+C content of corresponding motifs gradually increased. The use of GG and C C dinucleosides was preferred by the connecting sequence, and the use of CG and GC dinucleosides was obviously preferred by the core sequence. The theoretical prediction of nucleosome-binding motif sets is of great significance for the overall understanding of nucleosome localization and chromatin remodeling, as well as the structure and evolution of DNA sequences. The 8-mer set was classified according to the content of CG-dinucleoside in 8-mer. It was found that three 8-mer subsets (OCG, 1CG and 2CG) formed independent unimodal distributions, which were not found in the other 15 types of dinucleosides. The three peaks of 8-mer were the superposition of the three CG-8-mer subsets. In order to verify our conjecture, the relative frequencies of preference and rare trinucleotides in the 1CG 8-mer set are calculated, and the characteristic parameters Ktri (O) and Ktri (R) of nucleosomes are constructed respectively. The results of linear fitting showed that the sequences with confidence greater than 95% accounted for 89.2% of the total, and those with confidence greater than 99% accounted for 81.6% of the total. The theoretical conjecture of the body combined with the phantom.
【學位授予單位】:內蒙古大學
【學位級別】:博士
【學位授予年份】:2016
【分類號】:Q343.23
【參考文獻】
相關期刊論文 前6條
1 周德良;李宏;楊小希;;人類1號染色體DNA序列8-mer的相對模體數(shù)分布及8-mer使用的進化分離[J];生物物理學報;2015年01期
2 劉輝;壯子恒;關佶紅;周水庚;;核小體定位的轉錄調控功能研究進展[J];生物化學與生物物理進展;2012年09期
3 劉宏德;孫嘯;;核小體定位模式及其與DNA甲基化位點分布的關系[J];中國生物化學與分子生物學報;2011年03期
4 劉宏德;張德金;謝建明;袁志棟;馬昕;盧志遠;龔樂君;孫嘯;;miRNA基因和編碼基因啟動子區(qū)核小體定位分析[J];科學通報;2010年14期
5 黃百渠,曾慶華,畢曉輝,王玉紅,李玉新;組蛋白和核小體在基因轉錄中的作用[J];科學通報;2000年19期
6 曾慶華,尹東,孫迎春,黃百渠,呂延成;組蛋白與轉錄因子在hAMFR基因啟動子序列上的結合及相互作用[J];遺傳學報;1999年04期
,本文編號:2213844
本文鏈接:http://sikaile.net/shoufeilunwen/jckxbs/2213844.html
最近更新
教材專著