天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于RNA-Seq數(shù)據(jù)的差異表達(dá)基因檢測(cè)算法研究

發(fā)布時(shí)間:2018-12-08 12:31
【摘要】:RNA-Seq(Ribonucleic Acid Sequencing)技術(shù)是現(xiàn)代生物信息學(xué)研究的常規(guī)實(shí)驗(yàn)手段,主要目的是篩選出測(cè)序數(shù)據(jù)中具有差異表達(dá)的基因,即檢測(cè)出不同樣本下表達(dá)量不相同的基因。差異表達(dá)分析是研究生物個(gè)體在不同發(fā)育階段或不同生理環(huán)境下同一類基因的差異表達(dá),不僅具有統(tǒng)計(jì)學(xué)意義而且具有生物學(xué)意義,為認(rèn)識(shí)和理解生命活動(dòng)過(guò)程本質(zhì)以及研究基因表達(dá)調(diào)控提供重要理論基礎(chǔ)。本文對(duì)檢測(cè)RNA-Seq數(shù)據(jù)中差異表達(dá)基因的處理流程進(jìn)行分析研究,主要內(nèi)容包括:(1)基于加權(quán)截尾均值化M值(The Trimmed Mean of M-values,TMM)標(biāo)準(zhǔn)化和幾何平均標(biāo)準(zhǔn)化,給出了基于變異系數(shù)中值絕對(duì)偏差調(diào)整的改進(jìn)標(biāo)準(zhǔn)化算法。首先分別使用TMM法和幾何平均法得到標(biāo)準(zhǔn)化的數(shù)據(jù),計(jì)算每行基因在兩組數(shù)據(jù)中的變異系數(shù),比較兩個(gè)變異系數(shù)得到最優(yōu)變異系數(shù),從而得到新數(shù)據(jù),然后對(duì)新數(shù)據(jù)進(jìn)行中值絕對(duì)偏差調(diào)整,實(shí)現(xiàn)數(shù)據(jù)的標(biāo)準(zhǔn)化。實(shí)驗(yàn)結(jié)果表明,本文算法不但能消除測(cè)序技術(shù)上的誤差,將所有測(cè)序樣本調(diào)整到同一水平,而且誤差更小,精度更高。(2)基于svaseq(Surogate Variable Analysis Sequencing)算法給出了去除批次效應(yīng)的改進(jìn)svaseq算法。首先根據(jù)相關(guān)顯著性參數(shù),分別構(gòu)建正則對(duì)數(shù)變換模型和對(duì)數(shù)變換模型,然后通過(guò)加權(quán)最小二乘法估計(jì)模型中的參數(shù),得到數(shù)據(jù)的殘差矩陣,對(duì)該矩陣進(jìn)行因子分解,估計(jì)替代變量。實(shí)驗(yàn)結(jié)果表明,本文算法能更好的消除數(shù)據(jù)中的批次效應(yīng),而且差異表達(dá)結(jié)果也有一定的提高。(3)基于DESeq(Differential Expression Sequencing)算法給出了檢測(cè)差異表達(dá)基因的改進(jìn)DESeq算法。假設(shè)數(shù)據(jù)服從負(fù)二項(xiàng)式分布模型,首先根據(jù)改進(jìn)的標(biāo)準(zhǔn)化因子估計(jì)樣本的測(cè)序總數(shù),計(jì)算模型的均值和方差并估計(jì)離散參數(shù),然后利用精確檢驗(yàn)進(jìn)行差異表達(dá)分析。實(shí)驗(yàn)結(jié)果表明,本文算法能更好的檢測(cè)差異表達(dá)基因,并且準(zhǔn)度提高了 6.9%。
[Abstract]:RNA-Seq (Ribonucleic Acid Sequencing) technology is a conventional experimental method for modern bioinformatics research. The main purpose of this technique is to screen genes with different expression in sequencing data, that is, to detect genes with different expression levels in different samples. Differential expression analysis is to study the differential expression of the same kind of genes in different developmental stages or different physiological environments, which not only has statistical significance but also has biological significance. It provides an important theoretical basis for understanding and understanding the nature of life process and studying the regulation of gene expression. In this paper, the process of detecting differentially expressed genes in RNA-Seq data is analyzed. The main contents are as follows: (1) Standardization and geometric mean standardization based on weighted truncated mean M value (The Trimmed Mean of M-valuesTMM; An improved standardization algorithm based on the adjustment of mean absolute deviation of coefficient of variation is presented. First, the standardized data are obtained by using TMM method and geometric average method respectively. The coefficient of variation of each row gene in two groups of data is calculated, and the optimum coefficient of variation is obtained by comparing the two coefficients of variation, and the new data are obtained. Then the median absolute deviation is adjusted to realize the standardization of the new data. The experimental results show that the algorithm can not only eliminate the error in sequencing technology, but also adjust all the samples to the same level, and the error is even smaller. (2) based on svaseq (Surogate Variable Analysis Sequencing) algorithm, an improved svaseq algorithm is proposed to remove batch effect. Firstly, the canonical logarithmic transformation model and the logarithmic transformation model are constructed according to the relevant salience parameters, then the parameters in the model are estimated by the weighted least square method, and the residual matrix of the data is obtained, and the matrix is factorized. Estimate alternative variables. The experimental results show that the proposed algorithm can eliminate the batch effect better, and the differential expression results are improved. (3) based on DESeq (Differential Expression Sequencing) algorithm, an improved DESeq algorithm for detecting differentially expressed genes is proposed. Assuming that the data is distributed according to the negative binomial distribution model, the total number of samples is estimated according to the improved standardized factor, the mean value and variance of the model are calculated and the discrete parameters are estimated, and then the differential expression analysis is carried out by using accurate test. The experimental results show that the proposed algorithm can detect differentially expressed genes better and improve the accuracy by 6.9%.
【學(xué)位授予單位】:大連海事大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:Q811.4

【相似文獻(xiàn)】

相關(guān)期刊論文 前1條

1 劉學(xué)軍;李蒙;張禮;;一種針對(duì)RNA-Seq數(shù)據(jù)的基因異構(gòu)體表達(dá)水平計(jì)算方法[J];中國(guó)生物醫(yī)學(xué)工程學(xué)報(bào);2013年04期

相關(guān)博士學(xué)位論文 前1條

1 曾p瑤;基于小鼠15個(gè)組織RNA-seq數(shù)據(jù)的全基因組重注釋[D];中國(guó)科學(xué)院北京基因組研究所;2015年

相關(guān)碩士學(xué)位論文 前8條

1 陳\,

本文編號(hào):2368351


資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/shoufeilunwen/benkebiyelunwen/2368351.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶34304***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com
人妻少妇系列中文字幕| 果冻传媒在线观看免费高清| 久久三级国外久久久三级| 丰满熟女少妇一区二区三区| 我想看亚洲一级黄色录像| 国产色偷丝袜麻豆亚洲| 亚洲精选91福利在线观看 | 午夜精品麻豆视频91| 色播五月激情五月婷婷| 一区二区日韩欧美精品| 日韩精品成区中文字幕| 欧美成人免费夜夜黄啪啪| 国产一区欧美一区日本道| 久久国产成人精品国产成人亚洲| 亚洲一区二区精品免费| 欧美精品在线播放一区二区| 欧美日韩成人在线一区| 中日韩美一级特黄大片| 丁香七月啪啪激情综合| 亚洲中文字幕在线观看黑人| 亚洲超碰成人天堂涩涩| 懂色一区二区三区四区| 黄色国产一区二区三区| 久久人妻人人澡人人妻| 大伊香蕉一区二区三区| 日本欧美视频在线观看免费| 久久国产亚洲精品赲碰热| 欧美日韩乱码一区二区三区| 婷婷伊人综合中文字幕| 中文字幕精品一区二区年下载| 国产精品一区二区香蕉视频| 夜夜嗨激情五月天精品| 五月婷婷六月丁香亚洲| 亚洲国产性生活高潮免费视频| 欧洲一级片一区二区三区| 亚洲一区二区福利在线| 成人午夜视频在线播放| 千仞雪下面好爽好紧好湿全文| 国产麻豆精品福利在线| 欧美大胆美女a级视频| 人妻内射精品一区二区|