基于主成分分析和神經(jīng)網(wǎng)絡的癌癥驅(qū)動基因預測模型
發(fā)布時間:2018-06-07 22:30
本文選題:主成分分析 + 神經(jīng)網(wǎng)絡 ; 參考:《北京交通大學》2017年碩士論文
【摘要】:癌癥是人類生命和健康的主要威脅之一,它不僅給個人和家庭造成沉重的精神壓力和經(jīng)濟負擔,也嚴重影響了全球的經(jīng)濟發(fā)展和社會進步。癌癥產(chǎn)生機制及其控制研究已經(jīng)成為全球性的衛(wèi)生戰(zhàn)略研究重點。既往癌癥的研究主要集中在尋找其外部誘因,對于內(nèi)在的致癌機理知之甚少,直到高通量測序技術(shù)等方法的出現(xiàn),使得從基因水平分析內(nèi)因成為可能。通過分析癌癥形成過程中細胞內(nèi)基因表達水平的變化,人們發(fā)現(xiàn)有些基因能夠?qū)δ[瘤起控制作用,如果抑制這些基因表達或基因通路,就可以終止腫瘤發(fā)展的相關事件,這些基因被稱為癌癥驅(qū)動基因。驅(qū)動基因是決定癌癥的最主要內(nèi)部原因,針對驅(qū)動基因靶向治療,癌癥治療就可能事半功倍。目前,我們主要通過分析大量樣本的序列比對結(jié)果來預測癌癥驅(qū)動基因,這種基于生物學的方法易于理解,但往往需要對大量的癌癥樣本進行測序,花費昂貴。隨著分子生物學的快速發(fā)展,諸如TCGA(The Cancer Genome Atlas)等組織為研究者提供了數(shù)量龐大且更新及時的數(shù)據(jù)資源,此外,機器學習、數(shù)據(jù)挖掘等技術(shù)的涌現(xiàn)為分析這些數(shù)據(jù)提供了強大的支撐。驅(qū)動基因預測逐漸向數(shù)據(jù)化方向發(fā)展。本文介紹了驅(qū)動基因的研究背景、意義和方法,并對主成分分析方法和神經(jīng)網(wǎng)絡的基本原理及在本文中的應用做詳細分析介紹。基于這兩種方法,我們提出了一種用于預測癌癥驅(qū)動基因的系統(tǒng)生物學模型,該模型能夠從微陣列數(shù)據(jù)出發(fā)逐步得到驅(qū)動基因預測集,降低實驗過程中相關步驟的系統(tǒng)誤差和人為誤差,可以有效地減少經(jīng)費支出和實驗周期,為癌癥的靶向治療提供依據(jù)。本文選取多形性膠質(zhì)母細胞瘤作為實驗對象進行驗證。首先,對實驗樣本數(shù)據(jù)進行預處理,對腫瘤表達譜數(shù)據(jù)進行歸一化等處理,之后利用主成分分析方法進一步過濾無表達信息或者表達信息過低的表達數(shù)據(jù);其次,受模塊網(wǎng)絡的啟發(fā),對篩選出的基因進行劃分,將具有相似突變率的基因劃分在同一個塊中,并對塊進行排序;最后,通過受限玻爾茲曼機學習得到驅(qū)動基因的預測集,并將預測結(jié)果和文本挖掘的結(jié)果進行比較,發(fā)現(xiàn)有80%左右的基因符合文本挖掘的結(jié)果,證明本文提出的模型具有一定的可行性和有效性。
[Abstract]:Cancer is one of the main threats to human life and health. It not only causes heavy mental stress and economic burden to individuals and families, but also seriously affects global economic development and social progress. The research on the mechanism and control of cancer has become the focus of global health strategy research. Previous studies on cancer have focused on finding out the external causes, but little is known about the underlying carcinogenic mechanisms until the advent of high-throughput sequencing techniques, which make it possible to analyze the internal causes at the gene level. By analyzing the changes in gene expression levels in cells during cancer formation, it has been found that some genes can control tumors, and if these genes are inhibited or gene pathways are inhibited, the events associated with tumor development can be terminated. These genes are called cancer-driven genes. Driving gene is the main internal cause of cancer. At present, we mainly predict the cancer driving gene by analyzing the sequence alignment results of a large number of samples. This biology-based approach is easy to understand, but it often requires a large number of cancer samples to be sequenced, which is expensive. With the rapid development of molecular biology, organizations such as TCGA and the Cancer Genome Atlas have provided researchers with a large number of updated and timely data resources, in addition to machine learning. The emergence of technologies such as data mining provides a strong support for the analysis of these data. Driving gene prediction is gradually moving towards data. In this paper, the background, significance and method of driving gene are introduced, and the principle of principal component analysis (PCA), the basic principle of neural network and its application in this paper are introduced in detail. Based on these two methods, we propose a system biological model for predicting cancer driven genes. The model can be used to obtain the prediction set of driving genes from microarray data step by step. Reducing the systematic error and artificial error of the relative steps in the experiment process can effectively reduce the expenditure and the experimental period and provide the basis for the targeted treatment of cancer. Pleomorphic glioblastoma was selected as experimental object. First, preprocessing the experimental sample data, normalizing the tumor expression profile data, then using principal component analysis method to further filter the unexpressed information or the expression information too low expression data; secondly, Inspired by the module network, the selected genes are divided into the same block with similar mutation rate and sequenced. Finally, the prediction set of the driving gene is obtained by the restricted Boltzmann machine learning. By comparing the predicted results with the results of text mining, it is found that about 80% of the genes are consistent with the results of text mining, which proves that the proposed model is feasible and effective.
【學位授予單位】:北京交通大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:R73-3;TP183
【參考文獻】
相關期刊論文 前5條
1 錢曉燕;石遠凱;韓曉紅;;中國肺癌的驅(qū)動基因研究進展[J];科技導報;2014年26期
2 王敬慧;張宗德;張樹才;;肺腺癌驅(qū)動基因研究相關進展[J];中國肺癌雜志;2013年02期
3 劉冬;;比較基于留一法和bootstrap留一法得到的估計誤差的近似密度函數(shù)曲線[J];赤峰學院學報(自然科學版);2011年12期
4 姜偉;吳超;徐建凱;楊月瑩;李霞;;利用決策森林構(gòu)建復雜疾病驅(qū)動的基因網(wǎng)絡[J];中國生物醫(yī)學工程學報;2009年02期
5 高忠江;施樹良;李鈺;;SPSS方差分析在生物統(tǒng)計的應用[J];現(xiàn)代生物醫(yī)學進展;2008年11期
相關碩士學位論文 前1條
1 任叢林;基于壓縮感知算法的基因表達數(shù)據(jù)分類的研究[D];北京交通大學;2012年
,本文編號:1993056
本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/1993056.html
最近更新
教材專著