基于RNA-Seq數(shù)據(jù)的差異表達基因檢測算法研究

發(fā)布時間：2018-12-08 12:31

【摘要】：RNA-Seq(Ribonucleic Acid Sequencing)技術是現(xiàn)代生物信息學研究的常規(guī)實驗手段,主要目的是篩選出測序數(shù)據(jù)中具有差異表達的基因,即檢測出不同樣本下表達量不相同的基因。差異表達分析是研究生物個體在不同發(fā)育階段或不同生理環(huán)境下同一類基因的差異表達,不僅具有統(tǒng)計學意義而且具有生物學意義,為認識和理解生命活動過程本質以及研究基因表達調控提供重要理論基礎。本文對檢測RNA-Seq數(shù)據(jù)中差異表達基因的處理流程進行分析研究,主要內容包括:(1)基于加權截尾均值化M值(The Trimmed Mean of M-values,TMM)標準化和幾何平均標準化,給出了基于變異系數(shù)中值絕對偏差調整的改進標準化算法。首先分別使用TMM法和幾何平均法得到標準化的數(shù)據(jù),計算每行基因在兩組數(shù)據(jù)中的變異系數(shù),比較兩個變異系數(shù)得到最優(yōu)變異系數(shù),從而得到新數(shù)據(jù),然后對新數(shù)據(jù)進行中值絕對偏差調整,實現(xiàn)數(shù)據(jù)的標準化。實驗結果表明,本文算法不但能消除測序技術上的誤差,將所有測序樣本調整到同一水平,而且誤差更小,精度更高。(2)基于svaseq(Surogate Variable Analysis Sequencing)算法給出了去除批次效應的改進svaseq算法。首先根據(jù)相關顯著性參數(shù),分別構建正則對數(shù)變換模型和對數(shù)變換模型,然后通過加權最小二乘法估計模型中的參數(shù),得到數(shù)據(jù)的殘差矩陣,對該矩陣進行因子分解,估計替代變量。實驗結果表明,本文算法能更好的消除數(shù)據(jù)中的批次效應,而且差異表達結果也有一定的提高。(3)基于DESeq(Differential Expression Sequencing)算法給出了檢測差異表達基因的改進DESeq算法。假設數(shù)據(jù)服從負二項式分布模型,首先根據(jù)改進的標準化因子估計樣本的測序總數(shù),計算模型的均值和方差并估計離散參數(shù),然后利用精確檢驗進行差異表達分析。實驗結果表明,本文算法能更好的檢測差異表達基因,并且準度提高了 6.9%。
[Abstract]:RNA-Seq (Ribonucleic Acid Sequencing) technology is a conventional experimental method for modern bioinformatics research. The main purpose of this technique is to screen genes with different expression in sequencing data, that is, to detect genes with different expression levels in different samples. Differential expression analysis is to study the differential expression of the same kind of genes in different developmental stages or different physiological environments, which not only has statistical significance but also has biological significance. It provides an important theoretical basis for understanding and understanding the nature of life process and studying the regulation of gene expression. In this paper, the process of detecting differentially expressed genes in RNA-Seq data is analyzed. The main contents are as follows: (1) Standardization and geometric mean standardization based on weighted truncated mean M value (The Trimmed Mean of M-valuesTMM; An improved standardization algorithm based on the adjustment of mean absolute deviation of coefficient of variation is presented. First, the standardized data are obtained by using TMM method and geometric average method respectively. The coefficient of variation of each row gene in two groups of data is calculated, and the optimum coefficient of variation is obtained by comparing the two coefficients of variation, and the new data are obtained. Then the median absolute deviation is adjusted to realize the standardization of the new data. The experimental results show that the algorithm can not only eliminate the error in sequencing technology, but also adjust all the samples to the same level, and the error is even smaller. (2) based on svaseq (Surogate Variable Analysis Sequencing) algorithm, an improved svaseq algorithm is proposed to remove batch effect. Firstly, the canonical logarithmic transformation model and the logarithmic transformation model are constructed according to the relevant salience parameters, then the parameters in the model are estimated by the weighted least square method, and the residual matrix of the data is obtained, and the matrix is factorized. Estimate alternative variables. The experimental results show that the proposed algorithm can eliminate the batch effect better, and the differential expression results are improved. (3) based on DESeq (Differential Expression Sequencing) algorithm, an improved DESeq algorithm for detecting differentially expressed genes is proposed. Assuming that the data is distributed according to the negative binomial distribution model, the total number of samples is estimated according to the improved standardized factor, the mean value and variance of the model are calculated and the discrete parameters are estimated, and then the differential expression analysis is carried out by using accurate test. The experimental results show that the proposed algorithm can detect differentially expressed genes better and improve the accuracy by 6.9%.
【學位授予單位】：大連海事大學
【學位級別】：碩士
【學位授予年份】：2017
【分類號】：Q811.4

【相似文獻】

相關期刊論文前1條

1 劉學軍;李蒙;張禮;;一種針對RNA-Seq數(shù)據(jù)的基因異構體表達水平計算方法[J];中國生物醫(yī)學工程學報;2013年04期

相關博士學位論文前1條

1 曾p瑤;基于小鼠15個組織RNA-seq數(shù)據(jù)的全基因組重注釋[D];中國科學院北京基因組研究所;2015年

相關碩士學位論文前8條

1 陳\，

本文編號：2368351

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/shoufeilunwen/benkebiyelunwen/2368351.html

上一篇：空間數(shù)據(jù)組合式聯(lián)動可視化框架設計與實現(xiàn)
下一篇：基于GIS川黔鐵路貴州段地質災害易發(fā)性評價研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于RNA-Seq數(shù)據(jù)的差異表達基因檢測算法研究