小鼠胚胎干細胞高置信度lincRNAs的預測及其調(diào)控模式的研究
發(fā)布時間:2018-01-16 05:15
本文關鍵詞:小鼠胚胎干細胞高置信度lincRNAs的預測及其調(diào)控模式的研究 出處:《哈爾濱工業(yè)大學》2017年博士論文 論文類型:學位論文
更多相關文章: lincRNA ES細胞 RNA-Seq 互作網(wǎng)絡 組蛋白修飾
【摘要】:lincRNAs在新陳代謝、生長發(fā)育,以及疾病等方面發(fā)揮功能,并在各個層面調(diào)控基因表達。作為關鍵的調(diào)控因子,lincRNAs在小鼠ES細胞中發(fā)揮重要的調(diào)節(jié)作用。本課題將利用高通量數(shù)據(jù)RNA-Seq識別在小鼠ES細胞中表達的未經(jīng)注釋的高置信度lincRNAs轉(zhuǎn)錄本,完善lincRNAs的基因組注釋。并識別增強子相關lincRNAs與啟動子相關lincRNAs的特征調(diào)控模式,以及elincRNAs與啟動子互作的識別,研究lincRNAs對基因的表達調(diào)控作用。本論文整合多套小鼠ES細胞,以及早期胚胎、全胚胎等RNA-Seq數(shù)據(jù),識別了6 701個小鼠ES細胞表達的新lincRNAs。RNA-Seq讀段的覆蓋率和CAGE進行轉(zhuǎn)錄本完整性評估的結(jié)果表明,基于RNA-Seq識別的新lincRNAs是5′端缺失的不完整的轉(zhuǎn)錄本。已知lincRNAs和蛋白質(zhì)編碼轉(zhuǎn)錄本的TSS區(qū)域的分析結(jié)果表明lincRNAs具有特異的基因組與表觀基因組特征。預測模型十倍交叉驗證和獨立的檢驗集進行評估結(jié)果表明,整合基因組與表觀基因組特征的lincRNA轉(zhuǎn)錄本TSS區(qū)域預測模型效能最優(yōu)。在小鼠全基因組范圍內(nèi)進行l(wèi)incRNA轉(zhuǎn)錄本TSS區(qū)域的預測,并修正了1 293個新lincRNAs的TSS區(qū)域。利用CAGE以及活性染色質(zhì)修飾對修正前后的lincRNA轉(zhuǎn)錄本TSS區(qū)域進行評估,結(jié)果表明基于預測的TSS區(qū)域在小鼠ES細胞中獲得了相對完整的lincRNA轉(zhuǎn)錄本。對新lincRNAs進行基因組的分布分析以及基因組與表觀基因組表征,新lincRNAs與已知lincRNAs特征相似,具有比蛋白質(zhì)編碼轉(zhuǎn)錄本相對少的外顯子個數(shù)、相對短的轉(zhuǎn)錄本長度,以及相對低的保守性等特征,并富集重復元件;并且lincRNAs的表觀遺傳修飾模型顯著地區(qū)別于蛋白質(zhì)編碼轉(zhuǎn)錄本。利用RT-PCR檢測新lincRNAs在不同細胞系和小鼠不同發(fā)育階段的不同組織的表達水平,結(jié)果表明新lincRNAs的組織/細胞特異性表達。進一步利用RACE實驗對TCONS_00041333轉(zhuǎn)錄本全長進行鑒定,結(jié)果表明該lincRNAs包含兩個轉(zhuǎn)錄本,長度分別為656 bp和571 bp。核心啟動子元件的結(jié)合區(qū)域的分析表明,在其TSS上游存在TATA-box、GC-box、CCAAT-box和Initiator的結(jié)合區(qū)域,并富集H3K4me1和H3K27ac組蛋白修飾。按照染色質(zhì)狀態(tài)可以將lincRNAs分為elincRNAs(enhancer associated lincRNAs)和plincRNAs(promoter associated lincRNAs);谛∈驟S細胞已知lincRNA轉(zhuǎn)錄本TSS區(qū)域的H3K4me1/H3K4me3富集比率識別了包含224個elincRNAs與112個plincRNAs的高置信度集合。整合基因組與表觀基因組特征,利用正則化的羅杰斯特回歸模型識別顯著調(diào)控elincRNAs與plincRNAs的特征,elincRNAs與TSS區(qū)域的DNA甲基化,以及基因體區(qū)域的DNA甲基化和H3K122ac的調(diào)控相關;plincRNAs與TSS區(qū)域的H3K9ac,以及基因體區(qū)域的H3K36me3的調(diào)控相關。并且基于預測模型識別了3 729個elincRNAs和1 392個plincRNAs。對elincRNAs和plincRNAs進行基因組與表觀基因組表征,elincRNAs具有比plincRNAs相對較少的外顯子個數(shù)、相對短的轉(zhuǎn)錄本長度、相對低的表達水平和序列保守性,以及差異的染色質(zhì)修飾模式等特征;诮M蛋白修飾模式和轉(zhuǎn)錄因子富集模式分析小鼠ES細胞elincRNAs與啟動子互作的調(diào)控模式,結(jié)果表明,elincRNAs與啟動子間的互作更傾向于受轉(zhuǎn)錄因子的調(diào)控。并通過小鼠ES細胞elincRNAs與啟動子高置信度互作集合的評價表明,基于轉(zhuǎn)錄因子斯皮爾曼相關性識別的elincRNAs與啟動子互作是最優(yōu)的預測集合。構(gòu)建基于elincRNAs與啟動子互作高置信度集合的互作網(wǎng)絡,以及基于轉(zhuǎn)錄因子相關性的互作子網(wǎng)絡,網(wǎng)絡拓撲特征的分析結(jié)果表明,子網(wǎng)絡的網(wǎng)絡特性與互作網(wǎng)絡相似,elincRNAs特異靶向一些啟動子,而非廣泛地調(diào)控。對互作子網(wǎng)絡進行模塊挖掘以及功能富集分析,一些模塊富集在RNA聚合酶Ⅱ結(jié)合的轉(zhuǎn)錄激活的轉(zhuǎn)錄因子的功能,并ES細胞和胚胎發(fā)育相關功能。因此,elincRNAs可能參與靶基因轉(zhuǎn)錄的激活作用。綜上所述,本研究識別一組小鼠ES細胞中表達的轉(zhuǎn)錄本邊界相對完整的lincRNAs集合,并基于機器學習模型識別elincRNAs與plincRNAs的調(diào)控特征,在小鼠ES細胞中識別elincRNAs與其靶向啟動子的互作關系。本研究不僅發(fā)現(xiàn)并研究小鼠發(fā)育過程中重要的lincRNAs,對于系統(tǒng)研究早期胚胎發(fā)育lincRNAs對基因表達的調(diào)控也具有重要意義。
[Abstract]:LincRNAs on the growth and development, the function of The new supersedes the old., and disease, and expressed at various levels of regulatory genes. As a key regulatory factor, lincRNAs play an important role in mouse ES cells. This paper will use RNA-Seq to identify the high-throughput data expression in mouse ES cells without high confidence lincRNAs transcripts note, complete lincRNAs genome annotation and identification. LincRNAs promoter and enhancer related lincRNAs pattern recognition and regulation, elincRNAs and promoter interactions, the effect of lincRNAs on gene expression regulation. The integration of multiple sets of mouse ES cells, and early embryos, whole embryo RNA-Seq data, identify the new lincRNAs.RNA-Seq reading section 6701 expression of mouse ES cells and the coverage of CAGE transcript integrity assessment results show that the new identification based on RNA-Seq LincRNAs is the 5 'end of the lack of incomplete transcripts. The analysis results of known lincRNAs and protein encoding transcripts in TSS region show that lincRNAs have specific genome and epigenome characteristics. Prediction model of ten fold cross validation and independent test set of evaluation results show that the integration of genome and lincRNA genome transcription characteristics of the table view the TSS prediction model. The optimal efficiency of regional prediction of lincRNA transcripts in TSS region in the mouse genome range, and modified the TSS area 1293 new lincRNAs. Chromatin modification were evaluated before and after the repair of lincRNA transcription is TSS region using CAGE and activity, the results show that the TSS region prediction obtained lincRNA transcription the relative integrity in mouse ES cells. Based on genomic distribution analysis and genome and epigenome characterization of new lincRNAs, new lincRNAs Similar with known lincRNAs features, compared with protein encoding transcripts less exon number, the transcription of relatively short length, and relatively low conservation features, and accumulation of repetitive elements; and epigenetic modifications of lincRNAs model is significantly different from the quality of encoding transcription protein expression by RT-PCR detection. The new lincRNAs in different tissues in different cell lines and mouse at different developmental stages. The results showed that the expression of new lincRNAs cell / tissue specificity. Further experiments using RACE for full-length TCONS_00041333 transcripts were identified, the results show that the lincRNAs contains two transcripts in length respectively combining the analysis of the regional 656 BP and 571 bp. core promoter the element that TATA-box exists in its TSS region upstream GC-box, combined with CCAAT-box and Initiator, and the enrichment of H3K4me1 and H3K27ac in accordance with the staining of histone modification. State lincRNAs can be divided into elincRNAs (enhancer associated lincRNAs) and plincRNAs (promoter associated lincRNAs). Based on the known mouse ES cell lincRNA transcripts in TSS region of H3K4me1/H3K4me3 enrichment ratio identified high reliability set contains 224 elincRNAs and 112 plincRNAs. The integration of genome and epigenome characteristics, characteristics of the regularized Rodgers regression significantly regulation of elincRNAs and plincRNAs model identification, elincRNAs and TSS region of DNA methylation, and the regulation of DNA methylation and genomic regions related to H3K122ac plincRNAs and TSS H3K9ac; region, and the regulation of genomic region of H3K36me3. And the prediction model of the identification of 3729 elincRNAs and 1392 plincRNAs. of the genome for elincRNAs and plincRNAs and the epigenome characterization based on elincRNAs is less than plincRNAs The exon number, a relatively short length of the transcription, expression and sequence conservation is relatively low, and the differences in chromatin modification patterns and other features. The results show that histone modification patterns and transcription factor enrichment pattern analysis control mode, elincRNAs mouse ES cells and the promoter interactions based on interaction control tend to be regulated by the transcription factor elincRNAs and promoter. And through the elincRNAs mouse ES cells and the promoter of high confidence interactions set the evaluation showed that the elincRNAs promoter and the transcription factor Spielman correlation identification interaction is the optimal prediction based on set. To construct the elincRNAs promoter and the interaction of the interaction network reliability based on set, and based on the interaction of sub network of transcription factor correlation analysis results, network topological features show that the network characteristics of sub network similarity and interaction network, elincRNAs specific target To some promoter, rather than widely regulation. Module mining and enrichment analysis of interaction sub networks, some transcription factor module enrichment combined with RNA polymerase II transcription activation function, and ES cells and embryonic development related functions. Therefore, elincRNAs activation may be involved in the transcription of target genes. To sum up, the identification of transcriptional expression of mouse ES cells in a set of relatively complete set the boundary of lincRNAs, and based on regulation characteristics of machine learning model to identify elincRNAs and plincRNAs, in mouse ES cells and identification of elincRNAs targeting interaction promoter. This research not only found lincRNAs and important research in mice during development it is of important significance for regulating system of early embryo development of lincRNAs gene expression.
【學位授予單位】:哈爾濱工業(yè)大學
【學位級別】:博士
【學位授予年份】:2017
【分類號】:R3416
,
本文編號:1431690
本文鏈接:http://sikaile.net/shoufeilunwen/yxlbs/1431690.html
最近更新
教材專著