基于重疊社區(qū)發(fā)現(xiàn)算法的大豆基因表達(dá)數(shù)據(jù)分析
本文關(guān)鍵詞:基于重疊社區(qū)發(fā)現(xiàn)算法的大豆基因表達(dá)數(shù)據(jù)分析 出處:《吉林大學(xué)》2017年碩士論文 論文類型:學(xué)位論文
更多相關(guān)文章: 基因表達(dá)數(shù)據(jù) 差異表達(dá)分析 復(fù)雜網(wǎng)絡(luò) 重疊社區(qū) 功能富集分析
【摘要】:基因微陣列技術(shù)和RNA-Seq技術(shù)快速成熟發(fā)展,已經(jīng)獲得了大量物種的基因表達(dá)數(shù)據(jù);虮磉_(dá)數(shù)據(jù)反映的是生物細(xì)胞在某一時(shí)刻的基因轉(zhuǎn)錄水平,蘊(yùn)含著細(xì)胞在不同環(huán)境下的分子活動(dòng)信息。大豆是一種重要的農(nóng)作物,有學(xué)者利用微陣列技術(shù)對(duì)其做了許多研究,獲得了大量珍貴的基因表達(dá)譜數(shù)據(jù)。分析大豆基因表達(dá)數(shù)據(jù)中隱含的生物學(xué)信息,對(duì)于大豆抗病性研究,改良農(nóng)作物品種具有重要意義。常見的基因表達(dá)數(shù)據(jù)分析方法有差異表達(dá)分析、分類以及聚類分析等。聚類算法屬于無(wú)監(jiān)督學(xué)習(xí)算法,已被廣泛應(yīng)用于基因表達(dá)數(shù)據(jù)分析領(lǐng)域,可以借助聚類算法對(duì)基因表達(dá)數(shù)據(jù)做一些探索性分析。基因往往通過(guò)相互作用形成一些社區(qū)結(jié)構(gòu)來(lái)表達(dá)某一生物功能,具有這種社區(qū)結(jié)構(gòu)的基因被稱作共表達(dá)基因,通過(guò)聚類找到這些基因具有重要的意義。近年來(lái),復(fù)雜網(wǎng)絡(luò)的社區(qū)發(fā)現(xiàn)算法取得了很大進(jìn)展?梢酝ㄟ^(guò)計(jì)算基因之間的相似度,構(gòu)造出基因表達(dá)網(wǎng)絡(luò),把聚類問(wèn)題轉(zhuǎn)換成社區(qū)發(fā)現(xiàn)問(wèn)題。研究證明,一個(gè)基因往往會(huì)參與不止一個(gè)生物功能,不同類的共表達(dá)基因相互重疊,傳統(tǒng)的聚類算法如k-means、層次聚類等都不能發(fā)現(xiàn)這種重疊結(jié)構(gòu),模糊聚類算法可以識(shí)別這種重疊現(xiàn)象,但其參數(shù)太多不易設(shè)置,性能較低,不適用于大數(shù)據(jù)集。針對(duì)基因表達(dá)數(shù)據(jù)中的這種重疊現(xiàn)象,可以利用重疊社區(qū)發(fā)現(xiàn)算法來(lái)研究。Speak Easy算法是典型的重疊社區(qū)發(fā)現(xiàn)算法之一,該算法是一種同時(shí)采用自頂向下與自底向上策略的標(biāo)簽傳播算法,在對(duì)節(jié)點(diǎn)進(jìn)行劃分的時(shí)候不止考慮其所在局部子圖的信息,還要考慮整體網(wǎng)絡(luò)結(jié)構(gòu)信息。Speak Easy算法具有以下優(yōu)點(diǎn):可以自動(dòng)預(yù)測(cè)社區(qū)數(shù)目,無(wú)需人為設(shè)定參數(shù);適用于多種網(wǎng)絡(luò)圖;算法運(yùn)行速度快。但在實(shí)驗(yàn)過(guò)程中,發(fā)現(xiàn)Speak Easy在識(shí)別重疊節(jié)點(diǎn)經(jīng)常出現(xiàn)重疊節(jié)點(diǎn)所占比重過(guò)大的不合理現(xiàn)象。針對(duì)這一缺陷,我們提出了改進(jìn)的Speak Easy重疊節(jié)點(diǎn)識(shí)別算法,并通過(guò)實(shí)驗(yàn)證明了改進(jìn)算法的有效性。本文選擇GEO數(shù)據(jù)庫(kù)中GPL4592平臺(tái)下的大豆銹病相關(guān)的基因表達(dá)數(shù)據(jù),首先,根據(jù)基因表達(dá)數(shù)據(jù)分析流程,對(duì)其進(jìn)行預(yù)處理并篩選出7971個(gè)差異表達(dá)基因。其次,選用皮爾森相關(guān)系數(shù)來(lái)衡量基因之間的相似性程度,構(gòu)造出了大豆差異表達(dá)基因的加權(quán)網(wǎng)絡(luò)圖G(V,E)。之后,采用改進(jìn)的Speak Easy算法實(shí)現(xiàn)了對(duì)圖G的社區(qū)劃分。最后,用DAVID在線分析工具對(duì)社區(qū)劃分結(jié)果進(jìn)行了功能富集分析。分析發(fā)現(xiàn),社區(qū)S3內(nèi)的基因主要調(diào)控黃酮類化合物的合成,黃酮類化合物含量上升有助于提高植物抗病性;社區(qū)S2內(nèi)的基因調(diào)控大豆細(xì)胞的響應(yīng)刺激;還有些社區(qū)內(nèi)的基因調(diào)控葉綠素的合成,調(diào)控光合作用過(guò)程,有的社區(qū)內(nèi)的基因主要參與調(diào)控大豆基因的轉(zhuǎn)錄表達(dá)。將我們的分析結(jié)果與已有文獻(xiàn)對(duì)照,分析了大豆銹病的病理,同時(shí)發(fā)現(xiàn)在銹病影響下,大豆細(xì)胞會(huì)做出一些防御,比如黃酮類和芳香類化合物含量上升,細(xì)胞壁增厚增強(qiáng)?偨Y(jié)起來(lái),本文的主要工作有三點(diǎn):首先對(duì)數(shù)據(jù)進(jìn)行預(yù)處理,并找出差異表達(dá)基因;其次改進(jìn)了Speak Easy重疊社區(qū)識(shí)別算法,并采用改進(jìn)的算法對(duì)差異基因進(jìn)行了社區(qū)劃分;對(duì)劃分結(jié)果采用DAVID方法進(jìn)行了富集分析,并對(duì)重點(diǎn)基因或基因集合進(jìn)行了KEGG映射和GO分析。本文對(duì)于了解銹病病菌影響大豆生長(zhǎng)的機(jī)理,進(jìn)一步分析銹病脅迫下大豆的防御反應(yīng)具有一定的幫助,也有助于大豆抗病性的研究。
[Abstract]:Gene microarray technology and RNA-Seq technology rapid development and mature, has received a large number of species of gene expression data. Gene expression data reflect the gene transcription level of biological cells at a given time, contains the molecular activity of cells in different environments. Soybean is an important crop, have done a lot of researches on the study on the use of microarray technology, get a lot of valuable biological information of gene expression data. The implicit analysis of soybean gene expression data, for the study on disease resistance of soybean, improved crop varieties is of great significance. The common analysis methods of gene expression data with differential expression analysis, classification and clustering analysis. Clustering algorithm is an unsupervised learning algorithm has been widely used in the analysis of gene expression data, can use the clustering algorithm for gene expression data to do some exploratory points Analysis of genes through interaction. Often the formation of some community structure to express a biological function, with the community structure of the gene is called co expression genes, these genes found by clustering is of great significance. In recent years, the complex network community discovery algorithm has made great progress. By calculating the similarity between genes, construct gene expression network, the clustering problem is converted into a community discovery problem. Studies have shown that a gene are involved in more than one biological function, overlapping gene co expression of different types, such as clustering of traditional K-means algorithm, hierarchical clustering can find overlapping structure of the fuzzy clustering algorithm, can identify the overlapping. But it is not easy to set up too many parameters, performance is relatively low, is not suitable for large data sets. In view of this gene expression data in the overlap phenomenon, can use the overlapping agency To study the.Speak Easy algorithm is one of the typical algorithms found overlapping community discovery algorithm, this algorithm is a kind of top-down and bottom-up and label propagation algorithm on strategy, not only consider the local map information in time division of the node, but also to consider the overall network structure information of the.Speak Easy algorithm has the following advantages: it can automatically predict the number of communities, there is no need to set the parameters; and is applicable to a variety of network diagram; the algorithm is fast. But in the course of the experiment found that Speak Easy in the identification of overlapping nodes often overlapping nodes proportion unreasonable phenomenon. To solve this problem, we propose an improved Speak Easy overlapping nodes recognition algorithm, and the effectiveness of the improved algorithm is proved by the experiment. This paper chooses GPL4592 platform in the GEO database under the soybean rust genes number According to, first, according to the data analysis process of gene expression, the pretreatment and screened 7971 differentially expressed genes. Secondly, using Pearson correlation coefficient to measure the degree of similarity between genes, construct the weighted network diagram of G gene expression in Soybean (V, E). The difference after using Speak Easy algorithm the improved implementation of G community division. Finally, the tool for functional enrichment analysis on community division results online by DAVID analysis. The result showed that the main regulation of flavonoid synthesis genes within the S3 community, and increased the content of flavonoids is helpful to improve plant disease resistance; gene regulation of soybean cells in response to stimulation the community in S2; also some chlorophyll synthesis gene regulation within the community, the regulation of photosynthesis, transcription of some genes in the community is mainly involved in the regulation of soybean gene. We will analysis The results were compared with the existing literature, analysis of the pathology of soybean rust, also found in the rust under the influence of soybean cells will make some defense, such as the increase of flavonoids and phenolic compounds content, cell wall thickening enhancement. To sum up, the main work of this paper has three points: first, preprocess data, and find out the differential expression secondly, the improvement of Speak Easy gene; overlapping community recognition algorithm, and the differential genes by community division using the improved algorithm; the results of the partitioning DAVID method using the enrichment analysis, and focused on the gene or gene sets were analyzed with KEGG mapping and GO. In this paper, for the understanding of the mechanism of rust effect of soybean growth, further analysis rust stress defense response of Soybean under certain help, also contribute to the resistance of soybean.
【學(xué)位授予單位】:吉林大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:S565.1;Q811.4
【相似文獻(xiàn)】
相關(guān)期刊論文 前2條
1 陳佳妮;段文英;丁徽;;模糊C-均值聚類分析在基因表達(dá)數(shù)據(jù)分析中的應(yīng)用[J];森林工程;2010年02期
2 劉天飛;唐國(guó)慶;李學(xué)偉;;不同實(shí)驗(yàn)類型的基因表達(dá)數(shù)據(jù)聚類分析方法研究[J];畜牧獸醫(yī)學(xué)報(bào);2009年02期
相關(guān)會(huì)議論文 前1條
1 楊昆;李建中;王朝坤;徐繼偉;;基因表達(dá)數(shù)據(jù)的基于類別樹和SVMs的多類癌癥分類算法[A];第二十一屆中國(guó)數(shù)據(jù)庫(kù)學(xué)術(shù)會(huì)議論文集(研究報(bào)告篇)[C];2004年
相關(guān)博士學(xué)位論文 前8條
1 張煥萍;面向基因表達(dá)數(shù)據(jù)的致病基因挖掘方法研究[D];南京航空航天大學(xué);2009年
2 蔡瑞初;基因表達(dá)數(shù)據(jù)挖掘若干關(guān)鍵技術(shù)研究[D];華南理工大學(xué);2010年
3 劉亞杰;基于智能優(yōu)化算法的腫瘤微陣列基因表達(dá)數(shù)據(jù)分類研究[D];云南大學(xué);2014年
4 陸慧娟;基于基因表達(dá)數(shù)據(jù)的腫瘤分類算法研究[D];中國(guó)礦業(yè)大學(xué);2012年
5 張麗娟;微陣列基因表達(dá)數(shù)據(jù)分類問(wèn)題中的屬性選擇技術(shù)研究[D];國(guó)防科學(xué)技術(shù)大學(xué);2008年
6 毛志毅;基因表達(dá)數(shù)據(jù)基因篩選與近紅外光譜微量成分模型優(yōu)化方法研究[D];南開大學(xué);2014年
7 張琛;基因芯片數(shù)據(jù)處理與分析方法研究[D];吉林大學(xué);2011年
8 程慧杰;基于模式識(shí)別方法的基因表達(dá)數(shù)據(jù)分析研究[D];哈爾濱工程大學(xué);2012年
相關(guān)碩士學(xué)位論文 前10條
1 李科;EMD去噪算法研究及其在結(jié)腸癌基因表達(dá)數(shù)據(jù)集中的應(yīng)用[D];陜西師范大學(xué);2015年
2 田小龍;基于智能優(yōu)化計(jì)算的雙聚類算法研究[D];西安電子科技大學(xué);2014年
3 晉飛鳴;基于ELM的腫瘤基因表達(dá)數(shù)據(jù)分類算法研究[D];東北大學(xué);2013年
4 嚴(yán)晶;基因表達(dá)數(shù)據(jù)的合并雙向聚類算法[D];湘潭大學(xué);2015年
5 周靜;一種基于多維基因組數(shù)據(jù)的基因功能模塊的識(shí)別方法[D];黑龍江大學(xué);2015年
6 高雪峰;膜計(jì)算在基因表達(dá)數(shù)據(jù)分析中的應(yīng)用[D];西華大學(xué);2015年
7 陳輝輝;基于基因表達(dá)數(shù)據(jù)的信息基因選擇研究[D];山東大學(xué);2016年
8 梁妍;基于多目標(biāo)的基因表達(dá)數(shù)據(jù)雙聚類算法的研究[D];廣西大學(xué);2016年
9 李曉丹;基于基因表達(dá)數(shù)據(jù)的癌癥特征基因選擇方法研究[D];北京工業(yè)大學(xué);2016年
10 席艷秋;基因表達(dá)數(shù)據(jù)的雙向聚類算法的研究[D];揚(yáng)州大學(xué);2011年
,本文編號(hào):1415438
本文鏈接:http://sikaile.net/shoufeilunwen/zaizhiyanjiusheng/1415438.html