基因組序列k-mer頻次分析及核小體結(jié)合模體的理論預(yù)測(cè)和驗(yàn)證
發(fā)布時(shí)間:2018-08-30 17:57
【摘要】:基因組序列k-mer出現(xiàn)的頻次存在進(jìn)化分離現(xiàn)象;谶@一現(xiàn)象,我們分析了酵母基因組核小體核心序列與核小體連接序列中k-mer (k≤8)使用頻次的差異。分析了人類1號(hào)染色體基因間序列8-mer使用頻次的三峰分布及在XY二核苷分類下的分布特征,給出了理論預(yù)測(cè)的核小體結(jié)合模體集合,并與核小體占據(jù)率實(shí)驗(yàn)結(jié)果進(jìn)行了比較。具體內(nèi)容如下;贐rogaard等人在實(shí)驗(yàn)上給出的酵母基因組序列上單堿基精度的核小體定位標(biāo)注,獲得全部的核小體中心序列和核小體連接序列。分析了k-mer(k取4、5、6和8)在兩類序列中相對(duì)使用頻率(RF)的差異,發(fā)現(xiàn)當(dāng)k≥6時(shí),少數(shù)高頻k-mer使用差異明顯。引入兩類序列k-mer相對(duì)使用頻率比的對(duì)數(shù)(LRF)參數(shù)值,并按照該值增序的方式排列模體,結(jié)果顯示模體長(zhǎng)度越長(zhǎng)兩類序列的使用差異越明顯,當(dāng)k7以后差異分布逐漸穩(wěn)定。按照核心序列8-mer相對(duì)使用頻率增序的方式排列模體,發(fā)現(xiàn)在相對(duì)使用頻率小于0.5的區(qū)域,兩類序列的8-mer使用差異更加顯著。分別計(jì)算了7個(gè)抽樣點(diǎn)附近核心序列偏好的8-mer和連接序列偏好8-mer的G+C含量和二核苷含量。結(jié)果顯示當(dāng)8-mer相對(duì)頻率逐步減小時(shí),對(duì)應(yīng)模體的G+C含量逐步增大,連接序列偏好GG和CC二核苷的使用,核心序列明顯偏好CG和GC二核苷的使用。總之,除了少數(shù)極偏好的模體外,兩類序列k-mer使用的差異多數(shù)出現(xiàn)在k-mer相對(duì)頻率很低的模體上,這些模體具有較高的G+C含量。核小體結(jié)合模體集合的理論預(yù)測(cè)對(duì)于全面了解核小體的定位和染色質(zhì)重塑以及DNA序列的結(jié)構(gòu)和進(jìn)化具有重要的意義。為了解釋人類基因組序列8-mer相對(duì)模體數(shù)隨頻次的分布的三峰現(xiàn)象。將8-mer集合按照8-mer中包含CG二核苷的含量分類,發(fā)現(xiàn)三個(gè)8-mer子集(OCG,1CG和2CG)各自形成獨(dú)立的單峰分布,而依照其它15類二核苷分類則沒(méi)有此現(xiàn)象,總體8-mer的三個(gè)峰正是這三個(gè)CG 8-mer子集分布的疊加。分析了DNA序列中8-mer使用的這一獨(dú)特的性質(zhì),結(jié)合對(duì)核小體結(jié)合序列的實(shí)驗(yàn)研究結(jié)論,我們提出了1CG模體集合就是核小體結(jié)合模體的理論猜想。為了驗(yàn)證我們的猜想,計(jì)算了1CG 8-mer集合中偏好和稀有的三核苷相對(duì)頻率,分別構(gòu)建了核小體特征參數(shù)Ktri(O)和Ktri(R),得到它們?cè)?177個(gè)基因轉(zhuǎn)錄起始序列(TSS)上的分布,然后與實(shí)驗(yàn)給出的核小體占據(jù)率分布比較。線性擬合的統(tǒng)計(jì)結(jié)果表明,置信度大于95%的序列占到總數(shù)的89.2%,置信度大于99%的序列占到總數(shù)的81.6%。比較的結(jié)果印證了1CG模體集合就是核小體結(jié)合模體的理論猜想。
[Abstract]:Based on the phenomena of evolutionary separation of k-mer frequencies in genomic sequences, we analyzed the differences of k-mer frequencies between nucleosome core sequences and nucleosome junction sequences of yeast genome, and analyzed the trimodal distribution of 8-mer frequencies in the intergenic sequences of human chromosome 1 and the fractionation under XY dinucleotide classification. The theoretical predicted nucleosome binding motif set is given and compared with the experimental results of nucleosome occupancy rate. The specific contents are as follows. Based on the precise nucleosome localization labeling on the yeast genome sequence given by Brogaard et al., all nucleosome center sequences and nucleosome junction sequences are obtained. The difference of relative use frequency (RF) of k-mer (k 4,5,6 and 8) in two types of sequences was analyzed. It was found that when k (> 6), a few high frequency k-mers were used differently. The more obvious the difference was, the more stable the difference was after k7. The 8-mer of core sequence and the G of connecting sequence were calculated respectively in the region where the relative use frequency was less than 0.5. The results showed that when the relative frequency of 8-mer gradually decreased, the G+C content of corresponding motifs gradually increased. The use of GG and C C dinucleosides was preferred by the connecting sequence, and the use of CG and GC dinucleosides was obviously preferred by the core sequence. The theoretical prediction of nucleosome-binding motif sets is of great significance for the overall understanding of nucleosome localization and chromatin remodeling, as well as the structure and evolution of DNA sequences. The 8-mer set was classified according to the content of CG-dinucleoside in 8-mer. It was found that three 8-mer subsets (OCG, 1CG and 2CG) formed independent unimodal distributions, which were not found in the other 15 types of dinucleosides. The three peaks of 8-mer were the superposition of the three CG-8-mer subsets. In order to verify our conjecture, the relative frequencies of preference and rare trinucleotides in the 1CG 8-mer set are calculated, and the characteristic parameters Ktri (O) and Ktri (R) of nucleosomes are constructed respectively. The results of linear fitting showed that the sequences with confidence greater than 95% accounted for 89.2% of the total, and those with confidence greater than 99% accounted for 81.6% of the total. The theoretical conjecture of the body combined with the phantom.
【學(xué)位授予單位】:內(nèi)蒙古大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2016
【分類號(hào)】:Q343.23
本文編號(hào):2213844
[Abstract]:Based on the phenomena of evolutionary separation of k-mer frequencies in genomic sequences, we analyzed the differences of k-mer frequencies between nucleosome core sequences and nucleosome junction sequences of yeast genome, and analyzed the trimodal distribution of 8-mer frequencies in the intergenic sequences of human chromosome 1 and the fractionation under XY dinucleotide classification. The theoretical predicted nucleosome binding motif set is given and compared with the experimental results of nucleosome occupancy rate. The specific contents are as follows. Based on the precise nucleosome localization labeling on the yeast genome sequence given by Brogaard et al., all nucleosome center sequences and nucleosome junction sequences are obtained. The difference of relative use frequency (RF) of k-mer (k 4,5,6 and 8) in two types of sequences was analyzed. It was found that when k (> 6), a few high frequency k-mers were used differently. The more obvious the difference was, the more stable the difference was after k7. The 8-mer of core sequence and the G of connecting sequence were calculated respectively in the region where the relative use frequency was less than 0.5. The results showed that when the relative frequency of 8-mer gradually decreased, the G+C content of corresponding motifs gradually increased. The use of GG and C C dinucleosides was preferred by the connecting sequence, and the use of CG and GC dinucleosides was obviously preferred by the core sequence. The theoretical prediction of nucleosome-binding motif sets is of great significance for the overall understanding of nucleosome localization and chromatin remodeling, as well as the structure and evolution of DNA sequences. The 8-mer set was classified according to the content of CG-dinucleoside in 8-mer. It was found that three 8-mer subsets (OCG, 1CG and 2CG) formed independent unimodal distributions, which were not found in the other 15 types of dinucleosides. The three peaks of 8-mer were the superposition of the three CG-8-mer subsets. In order to verify our conjecture, the relative frequencies of preference and rare trinucleotides in the 1CG 8-mer set are calculated, and the characteristic parameters Ktri (O) and Ktri (R) of nucleosomes are constructed respectively. The results of linear fitting showed that the sequences with confidence greater than 95% accounted for 89.2% of the total, and those with confidence greater than 99% accounted for 81.6% of the total. The theoretical conjecture of the body combined with the phantom.
【學(xué)位授予單位】:內(nèi)蒙古大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2016
【分類號(hào)】:Q343.23
【參考文獻(xiàn)】
相關(guān)期刊論文 前6條
1 周德良;李宏;楊小希;;人類1號(hào)染色體DNA序列8-mer的相對(duì)模體數(shù)分布及8-mer使用的進(jìn)化分離[J];生物物理學(xué)報(bào);2015年01期
2 劉輝;壯子恒;關(guān)佶紅;周水庚;;核小體定位的轉(zhuǎn)錄調(diào)控功能研究進(jìn)展[J];生物化學(xué)與生物物理進(jìn)展;2012年09期
3 劉宏德;孫嘯;;核小體定位模式及其與DNA甲基化位點(diǎn)分布的關(guān)系[J];中國(guó)生物化學(xué)與分子生物學(xué)報(bào);2011年03期
4 劉宏德;張德金;謝建明;袁志棟;馬昕;盧志遠(yuǎn);龔樂(lè)君;孫嘯;;miRNA基因和編碼基因啟動(dòng)子區(qū)核小體定位分析[J];科學(xué)通報(bào);2010年14期
5 黃百渠,曾慶華,畢曉輝,王玉紅,李玉新;組蛋白和核小體在基因轉(zhuǎn)錄中的作用[J];科學(xué)通報(bào);2000年19期
6 曾慶華,尹東,孫迎春,黃百渠,呂延成;組蛋白與轉(zhuǎn)錄因子在hAMFR基因啟動(dòng)子序列上的結(jié)合及相互作用[J];遺傳學(xué)報(bào);1999年04期
,本文編號(hào):2213844
本文鏈接:http://sikaile.net/shoufeilunwen/jckxbs/2213844.html
最近更新
教材專著