基于二代測序數(shù)據(jù)的癌癥驅(qū)動通路識別方法
發(fā)布時(shí)間:2018-04-25 13:10
本文選題:第二代測序技術(shù) + 癌癥。 參考:《曲阜師范大學(xué)》2016年碩士論文
【摘要】:隨著高通量測序技術(shù)的發(fā)展,研究人員已經(jīng)能夠在全基因組范圍內(nèi)解決生物學(xué)以及生物醫(yī)學(xué)中各種各樣的問題,在此過程中也得到了海量的生物學(xué)數(shù)據(jù)。這些技術(shù)包括微陣列技術(shù)(例如基因表達(dá),拷貝數(shù)變異,全基因組關(guān)聯(lián)研究以及甲基化測序技術(shù)),第二代測序技術(shù)(例如RNA-seq,全外顯子組測序以及全基因組測序技術(shù))以及ChIP-Seq等技術(shù)。分析由這些技術(shù)所產(chǎn)生的數(shù)據(jù)常常能夠發(fā)現(xiàn)一些值得注意的基因,這對于后續(xù)的生物學(xué)解釋和驗(yàn)證具有很深遠(yuǎn)的意義。癌癥通常是由基因突變的積累而引發(fā)的。最近,第二代測序技術(shù)的發(fā)展產(chǎn)生了大量的癌癥基因組數(shù)據(jù),這些數(shù)據(jù)幫助科研人員研究出識別癌癥發(fā)展過程中的一些重要基因突變的算法,然而,這些算法不能解決基因畸變的異質(zhì)性問題。因此,眾多學(xué)者從研究癌癥驅(qū)動基因轉(zhuǎn)而研究導(dǎo)致癌癥的驅(qū)動通路。為了識別癌癥驅(qū)動通路,必須發(fā)展出相應(yīng)的生物信息學(xué)算法。在本論文中,基于第二代測序數(shù)據(jù),重點(diǎn)圍繞著識別癌癥驅(qū)動通路的算法進(jìn)行研究,提出了有效的驅(qū)動通路識別算法,并且將算法的關(guān)鍵流程進(jìn)行了詳細(xì)闡述,同時(shí)與傳統(tǒng)算法的結(jié)果進(jìn)行了比較。本文的研究工作總結(jié)如下:第一,提出了一種改進(jìn)算法來解決 最大權(quán)重子矩陣‖問題,該問題是基于癌癥驅(qū)動通路的兩種性質(zhì)——覆蓋性和排斥性——來識別驅(qū)動突變通路。這種最優(yōu)化啟發(fā)式改進(jìn)算法稱為模擬退火遺傳算法(SAGA)。特別的,將基因表達(dá)數(shù)據(jù)融合到該算法中,使該算法運(yùn)行結(jié)果更符合生物學(xué)意義,并且取得了令人滿意的結(jié)果。第二,基于基因之間相互作用網(wǎng)絡(luò),基因變異將會通過改變或者移除某個(gè)點(diǎn)或者改變點(diǎn)的連接情況引起相互作用網(wǎng)絡(luò)結(jié)構(gòu)的變化,從而改變網(wǎng)絡(luò)中基因表達(dá)的生物化學(xué)性質(zhì),導(dǎo)致癌癥發(fā)生。根據(jù)此生物學(xué)現(xiàn)象,提出了DriverFinder算法,將正常樣本和癌癥樣本的基因表達(dá)數(shù)據(jù)聯(lián)合分析識別基因表達(dá)的離群值,同時(shí),基因過長而引起的隨機(jī)突變可以基于擬合廣義加性模型進(jìn)行濾除。通過使用DriverFinder算法,識別出具有生物學(xué)意義的癌癥驅(qū)動突變基因,將這些基因進(jìn)行生物學(xué)通路富集分析,從而識別出癌癥驅(qū)動通路。通過大量的實(shí)驗(yàn)比較結(jié)果證明了該算法是有效的。本文最后分析了當(dāng)前識別癌癥驅(qū)動通路研究中存在的一些問題和今后的研究中需要做的工作。
[Abstract]:With the development of high-throughput sequencing technology, researchers have been able to solve a variety of biological and biomedical problems in the whole genome, in the process of obtaining a large amount of biological data. These include microarray techniques (such as gene expression, copy number variation, Genome-wide association studies and methylation sequencing techniques, second generation sequencing techniques (e.g. RNA-seq, total exon sequencing and whole genome sequencing) and ChIP-Seq, etc. Analysis of the data generated by these techniques often leads to the discovery of some noteworthy genes, which are of great significance for subsequent biological interpretation and verification. Cancer is usually caused by the accumulation of mutations. Recently, the development of second-generation sequencing technology has produced a large amount of cancer genome data, which has helped researchers to develop algorithms to identify some of the important gene mutations in the development of cancer, however, These algorithms can not solve the heterogeneity problem of gene aberration. As a result, many researchers have shifted from studying cancer-driven genes to studying the driving pathways that lead to cancer. In order to identify the cancer driving pathway, the corresponding bioinformatics algorithm must be developed. In this paper, based on the second generation of sequencing data, we focus on the identification of cancer driving pathway algorithm, put forward an effective driving path recognition algorithm, and the key process of the algorithm is described in detail. At the same time, the results are compared with the traditional algorithm. The research work in this paper is summarized as follows: first, an improved algorithm is proposed to solve the problem of maximum weight submatrix. This problem is based on the two properties of cancer drive pathway-coverage and rejection-to identify the driving mutant pathway. This optimization heuristic improved algorithm is called simulated annealing genetic algorithm (SA). In particular, the genetic expression data is fused into the algorithm, which makes the results of the algorithm more suitable for biological significance, and the satisfactory results are obtained. Second, based on the network of interactions between genes, genetic variation will alter the biochemical properties of the gene expression in the network by changing or removing the connection of a point or changing the connection of the point of interaction. Leading to cancer. According to this biological phenomenon, the DriverFinder algorithm is proposed, which combines the gene expression data of normal and cancer samples to analyze the outliers of gene expression, and at the same time, Random mutations caused by gene length can be filtered based on fitted generalized additive model. By using the DriverFinder algorithm, we can identify the cancer-driven mutant genes with biological significance, and then analyze these genes by the enrichment analysis of the biological pathway, and then identify the cancer-driven pathway. A large number of experimental results show that the algorithm is effective. At the end of this paper, some problems existing in the research of identifying the driving pathway of cancer and the work to be done in the future are analyzed.
【學(xué)位授予單位】:曲阜師范大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2016
【分類號】:R73-3;Q811.4
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 于聘飛;王英;葛芹玉;;高通量DNA測序技術(shù)及其應(yīng)用進(jìn)展[J];南京曉莊學(xué)院學(xué)報(bào);2010年03期
2 解增言;林俊華;譚軍;舒坤賢;;DNA測序技術(shù)的發(fā)展歷史與最新進(jìn)展[J];生物技術(shù)通報(bào);2010年08期
3 傅俊英;趙蘊(yùn)華;;DNA測序技術(shù)領(lǐng)域的相關(guān)政府投入分析[J];現(xiàn)代生物醫(yī)學(xué)進(jìn)展;2012年05期
4 劉振波;;DNA測序技術(shù)比較[J];生物學(xué)通報(bào);2012年07期
5 劉朋虎;林冬梅;林占q,
本文編號:1801463
本文鏈接:http://sikaile.net/yixuelunwen/zlx/1801463.html
最近更新
教材專著