基于新一代測(cè)序數(shù)據(jù)的啟動(dòng)子類型識(shí)別研究
發(fā)布時(shí)間:2018-04-25 23:09
本文選題:啟動(dòng)子 + 組蛋白修飾; 參考:《哈爾濱工業(yè)大學(xué)》2017年碩士論文
【摘要】:對(duì)于人類基因組的整體探究工作已進(jìn)入“后基因組時(shí)代”,這是一個(gè)以揭示、闡明、挖掘基因組功能為核心研究對(duì)象的時(shí)代,在測(cè)序技術(shù)大力發(fā)展的東風(fēng)下,基因表達(dá)產(chǎn)物和表觀信息的功能鑒定已然進(jìn)入“大規(guī)模、高通量”的全新階段。對(duì)于基因表達(dá)調(diào)控機(jī)理的研究?jī)叭皇菬衢T課題。而對(duì)于基因表達(dá)調(diào)控網(wǎng)絡(luò)中的關(guān)鍵元件——啟動(dòng)子類型識(shí)別研究成為更深入理解人類基因組龐雜調(diào)控機(jī)制的敲門磚。在本課題中,我們首先對(duì)注釋基因數(shù)據(jù)進(jìn)行了一步預(yù)處理,得到本文稱之為單一基因的數(shù)據(jù),然后基于RNA-seq測(cè)序數(shù)據(jù),計(jì)算多個(gè)細(xì)胞系(Hepg2、Huvec、Gm 12878、K562及H1hesc)的基因表達(dá)量及分析各個(gè)細(xì)胞系的基因表達(dá)水平。接著又根據(jù)啟動(dòng)子區(qū)域有RNA聚合酶Ⅱ富集的特性,利用PolⅡ的ChIP-seq數(shù)據(jù)結(jié)合基因表達(dá)水平去識(shí)別活躍啟動(dòng)子與預(yù)備啟動(dòng)子,并研究了包含不同類型啟動(dòng)子的基因的表達(dá)水平情況以及在此基礎(chǔ)上分析細(xì)胞系中的選擇性啟動(dòng)子情況。最后,將距離基因轉(zhuǎn)錄起始位點(diǎn)上下游各1000堿基對(duì)的大區(qū)域分割為10個(gè)長(zhǎng)度為200堿基對(duì)的小區(qū)段,去統(tǒng)計(jì)細(xì)胞系H1hesc、Huvec以及Gm12878的6種組蛋白修飾信號(hào)在劃分區(qū)域的分布情況,分析組蛋白修飾信號(hào)在不同類型啟動(dòng)子區(qū)分布的特異性。以細(xì)胞系H1hesc的組蛋白修飾特征數(shù)據(jù)為訓(xùn)練集,應(yīng)用機(jī)器學(xué)習(xí)算法訓(xùn)練分類器對(duì)細(xì)胞系Huvec和Gm 12878的候選啟動(dòng)子進(jìn)行類型預(yù)測(cè)識(shí)別。
[Abstract]:The overall exploration of the human genome has entered the "post-genome era". This is an era in which the core research object is to reveal, clarify and excavate the function of the genome, with the rapid development of sequencing technology. Functional identification of gene expression products and epigenetic information has entered a new stage of "large scale and high throughput". The study of gene expression regulation mechanism has been a hot topic. The study of promoter type recognition, which is a key element in gene expression regulation network, has become the key to a deeper understanding of the complex regulatory mechanism of the human genome. In this paper, we preprocess the annotated gene data in one step, get the data that we call a single gene, and then sequence the data based on RNA-seq. The gene expression of Gm12878K562 and H1 hescwas calculated and the gene expression level of each cell line was analyzed. Then, according to the characteristic of RNA polymerase 鈪,
本文編號(hào):1803392
本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/1803392.html
最近更新
教材專著