基于重疊社區(qū)發(fā)現(xiàn)算法的大豆基因表達數(shù)據(jù)分析
本文關(guān)鍵詞:基于重疊社區(qū)發(fā)現(xiàn)算法的大豆基因表達數(shù)據(jù)分析 出處:《吉林大學》2017年碩士論文 論文類型:學位論文
更多相關(guān)文章: 基因表達數(shù)據(jù) 差異表達分析 復雜網(wǎng)絡 重疊社區(qū) 功能富集分析
【摘要】:基因微陣列技術(shù)和RNA-Seq技術(shù)快速成熟發(fā)展,已經(jīng)獲得了大量物種的基因表達數(shù)據(jù);虮磉_數(shù)據(jù)反映的是生物細胞在某一時刻的基因轉(zhuǎn)錄水平,蘊含著細胞在不同環(huán)境下的分子活動信息。大豆是一種重要的農(nóng)作物,有學者利用微陣列技術(shù)對其做了許多研究,獲得了大量珍貴的基因表達譜數(shù)據(jù)。分析大豆基因表達數(shù)據(jù)中隱含的生物學信息,對于大豆抗病性研究,改良農(nóng)作物品種具有重要意義。常見的基因表達數(shù)據(jù)分析方法有差異表達分析、分類以及聚類分析等。聚類算法屬于無監(jiān)督學習算法,已被廣泛應用于基因表達數(shù)據(jù)分析領域,可以借助聚類算法對基因表達數(shù)據(jù)做一些探索性分析。基因往往通過相互作用形成一些社區(qū)結(jié)構(gòu)來表達某一生物功能,具有這種社區(qū)結(jié)構(gòu)的基因被稱作共表達基因,通過聚類找到這些基因具有重要的意義。近年來,復雜網(wǎng)絡的社區(qū)發(fā)現(xiàn)算法取得了很大進展?梢酝ㄟ^計算基因之間的相似度,構(gòu)造出基因表達網(wǎng)絡,把聚類問題轉(zhuǎn)換成社區(qū)發(fā)現(xiàn)問題。研究證明,一個基因往往會參與不止一個生物功能,不同類的共表達基因相互重疊,傳統(tǒng)的聚類算法如k-means、層次聚類等都不能發(fā)現(xiàn)這種重疊結(jié)構(gòu),模糊聚類算法可以識別這種重疊現(xiàn)象,但其參數(shù)太多不易設置,性能較低,不適用于大數(shù)據(jù)集。針對基因表達數(shù)據(jù)中的這種重疊現(xiàn)象,可以利用重疊社區(qū)發(fā)現(xiàn)算法來研究。Speak Easy算法是典型的重疊社區(qū)發(fā)現(xiàn)算法之一,該算法是一種同時采用自頂向下與自底向上策略的標簽傳播算法,在對節(jié)點進行劃分的時候不止考慮其所在局部子圖的信息,還要考慮整體網(wǎng)絡結(jié)構(gòu)信息。Speak Easy算法具有以下優(yōu)點:可以自動預測社區(qū)數(shù)目,無需人為設定參數(shù);適用于多種網(wǎng)絡圖;算法運行速度快。但在實驗過程中,發(fā)現(xiàn)Speak Easy在識別重疊節(jié)點經(jīng)常出現(xiàn)重疊節(jié)點所占比重過大的不合理現(xiàn)象。針對這一缺陷,我們提出了改進的Speak Easy重疊節(jié)點識別算法,并通過實驗證明了改進算法的有效性。本文選擇GEO數(shù)據(jù)庫中GPL4592平臺下的大豆銹病相關(guān)的基因表達數(shù)據(jù),首先,根據(jù)基因表達數(shù)據(jù)分析流程,對其進行預處理并篩選出7971個差異表達基因。其次,選用皮爾森相關(guān)系數(shù)來衡量基因之間的相似性程度,構(gòu)造出了大豆差異表達基因的加權(quán)網(wǎng)絡圖G(V,E)。之后,采用改進的Speak Easy算法實現(xiàn)了對圖G的社區(qū)劃分。最后,用DAVID在線分析工具對社區(qū)劃分結(jié)果進行了功能富集分析。分析發(fā)現(xiàn),社區(qū)S3內(nèi)的基因主要調(diào)控黃酮類化合物的合成,黃酮類化合物含量上升有助于提高植物抗病性;社區(qū)S2內(nèi)的基因調(diào)控大豆細胞的響應刺激;還有些社區(qū)內(nèi)的基因調(diào)控葉綠素的合成,調(diào)控光合作用過程,有的社區(qū)內(nèi)的基因主要參與調(diào)控大豆基因的轉(zhuǎn)錄表達。將我們的分析結(jié)果與已有文獻對照,分析了大豆銹病的病理,同時發(fā)現(xiàn)在銹病影響下,大豆細胞會做出一些防御,比如黃酮類和芳香類化合物含量上升,細胞壁增厚增強?偨Y(jié)起來,本文的主要工作有三點:首先對數(shù)據(jù)進行預處理,并找出差異表達基因;其次改進了Speak Easy重疊社區(qū)識別算法,并采用改進的算法對差異基因進行了社區(qū)劃分;對劃分結(jié)果采用DAVID方法進行了富集分析,并對重點基因或基因集合進行了KEGG映射和GO分析。本文對于了解銹病病菌影響大豆生長的機理,進一步分析銹病脅迫下大豆的防御反應具有一定的幫助,也有助于大豆抗病性的研究。
[Abstract]:Gene microarray technology and RNA-Seq technology rapid development and mature, has received a large number of species of gene expression data. Gene expression data reflect the gene transcription level of biological cells at a given time, contains the molecular activity of cells in different environments. Soybean is an important crop, have done a lot of researches on the study on the use of microarray technology, get a lot of valuable biological information of gene expression data. The implicit analysis of soybean gene expression data, for the study on disease resistance of soybean, improved crop varieties is of great significance. The common analysis methods of gene expression data with differential expression analysis, classification and clustering analysis. Clustering algorithm is an unsupervised learning algorithm has been widely used in the analysis of gene expression data, can use the clustering algorithm for gene expression data to do some exploratory points Analysis of genes through interaction. Often the formation of some community structure to express a biological function, with the community structure of the gene is called co expression genes, these genes found by clustering is of great significance. In recent years, the complex network community discovery algorithm has made great progress. By calculating the similarity between genes, construct gene expression network, the clustering problem is converted into a community discovery problem. Studies have shown that a gene are involved in more than one biological function, overlapping gene co expression of different types, such as clustering of traditional K-means algorithm, hierarchical clustering can find overlapping structure of the fuzzy clustering algorithm, can identify the overlapping. But it is not easy to set up too many parameters, performance is relatively low, is not suitable for large data sets. In view of this gene expression data in the overlap phenomenon, can use the overlapping agency To study the.Speak Easy algorithm is one of the typical algorithms found overlapping community discovery algorithm, this algorithm is a kind of top-down and bottom-up and label propagation algorithm on strategy, not only consider the local map information in time division of the node, but also to consider the overall network structure information of the.Speak Easy algorithm has the following advantages: it can automatically predict the number of communities, there is no need to set the parameters; and is applicable to a variety of network diagram; the algorithm is fast. But in the course of the experiment found that Speak Easy in the identification of overlapping nodes often overlapping nodes proportion unreasonable phenomenon. To solve this problem, we propose an improved Speak Easy overlapping nodes recognition algorithm, and the effectiveness of the improved algorithm is proved by the experiment. This paper chooses GPL4592 platform in the GEO database under the soybean rust genes number According to, first, according to the data analysis process of gene expression, the pretreatment and screened 7971 differentially expressed genes. Secondly, using Pearson correlation coefficient to measure the degree of similarity between genes, construct the weighted network diagram of G gene expression in Soybean (V, E). The difference after using Speak Easy algorithm the improved implementation of G community division. Finally, the tool for functional enrichment analysis on community division results online by DAVID analysis. The result showed that the main regulation of flavonoid synthesis genes within the S3 community, and increased the content of flavonoids is helpful to improve plant disease resistance; gene regulation of soybean cells in response to stimulation the community in S2; also some chlorophyll synthesis gene regulation within the community, the regulation of photosynthesis, transcription of some genes in the community is mainly involved in the regulation of soybean gene. We will analysis The results were compared with the existing literature, analysis of the pathology of soybean rust, also found in the rust under the influence of soybean cells will make some defense, such as the increase of flavonoids and phenolic compounds content, cell wall thickening enhancement. To sum up, the main work of this paper has three points: first, preprocess data, and find out the differential expression secondly, the improvement of Speak Easy gene; overlapping community recognition algorithm, and the differential genes by community division using the improved algorithm; the results of the partitioning DAVID method using the enrichment analysis, and focused on the gene or gene sets were analyzed with KEGG mapping and GO. In this paper, for the understanding of the mechanism of rust effect of soybean growth, further analysis rust stress defense response of Soybean under certain help, also contribute to the resistance of soybean.
【學位授予單位】:吉林大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:S565.1;Q811.4
【相似文獻】
相關(guān)期刊論文 前2條
1 陳佳妮;段文英;丁徽;;模糊C-均值聚類分析在基因表達數(shù)據(jù)分析中的應用[J];森林工程;2010年02期
2 劉天飛;唐國慶;李學偉;;不同實驗類型的基因表達數(shù)據(jù)聚類分析方法研究[J];畜牧獸醫(yī)學報;2009年02期
相關(guān)會議論文 前1條
1 楊昆;李建中;王朝坤;徐繼偉;;基因表達數(shù)據(jù)的基于類別樹和SVMs的多類癌癥分類算法[A];第二十一屆中國數(shù)據(jù)庫學術(shù)會議論文集(研究報告篇)[C];2004年
相關(guān)博士學位論文 前8條
1 張煥萍;面向基因表達數(shù)據(jù)的致病基因挖掘方法研究[D];南京航空航天大學;2009年
2 蔡瑞初;基因表達數(shù)據(jù)挖掘若干關(guān)鍵技術(shù)研究[D];華南理工大學;2010年
3 劉亞杰;基于智能優(yōu)化算法的腫瘤微陣列基因表達數(shù)據(jù)分類研究[D];云南大學;2014年
4 陸慧娟;基于基因表達數(shù)據(jù)的腫瘤分類算法研究[D];中國礦業(yè)大學;2012年
5 張麗娟;微陣列基因表達數(shù)據(jù)分類問題中的屬性選擇技術(shù)研究[D];國防科學技術(shù)大學;2008年
6 毛志毅;基因表達數(shù)據(jù)基因篩選與近紅外光譜微量成分模型優(yōu)化方法研究[D];南開大學;2014年
7 張琛;基因芯片數(shù)據(jù)處理與分析方法研究[D];吉林大學;2011年
8 程慧杰;基于模式識別方法的基因表達數(shù)據(jù)分析研究[D];哈爾濱工程大學;2012年
相關(guān)碩士學位論文 前10條
1 李科;EMD去噪算法研究及其在結(jié)腸癌基因表達數(shù)據(jù)集中的應用[D];陜西師范大學;2015年
2 田小龍;基于智能優(yōu)化計算的雙聚類算法研究[D];西安電子科技大學;2014年
3 晉飛鳴;基于ELM的腫瘤基因表達數(shù)據(jù)分類算法研究[D];東北大學;2013年
4 嚴晶;基因表達數(shù)據(jù)的合并雙向聚類算法[D];湘潭大學;2015年
5 周靜;一種基于多維基因組數(shù)據(jù)的基因功能模塊的識別方法[D];黑龍江大學;2015年
6 高雪峰;膜計算在基因表達數(shù)據(jù)分析中的應用[D];西華大學;2015年
7 陳輝輝;基于基因表達數(shù)據(jù)的信息基因選擇研究[D];山東大學;2016年
8 梁妍;基于多目標的基因表達數(shù)據(jù)雙聚類算法的研究[D];廣西大學;2016年
9 李曉丹;基于基因表達數(shù)據(jù)的癌癥特征基因選擇方法研究[D];北京工業(yè)大學;2016年
10 席艷秋;基因表達數(shù)據(jù)的雙向聚類算法的研究[D];揚州大學;2011年
,本文編號:1415438
本文鏈接:http://sikaile.net/shoufeilunwen/zaizhiyanjiusheng/1415438.html