生命科學(xué)知識網(wǎng)絡(luò)系統(tǒng)構(gòu)建及網(wǎng)絡(luò)信息分析
發(fā)布時間:2018-04-16 12:00
本文選題:生物信息數(shù)據(jù)庫 + 網(wǎng)絡(luò)分析 ; 參考:《浙江大學(xué)》2012年博士論文
【摘要】:隨著高通量數(shù)據(jù)分析數(shù)據(jù)的大量產(chǎn)生,生物信息數(shù)據(jù)庫及系統(tǒng)生物學(xué)在生命科學(xué)研究中越來越重要。大量的數(shù)據(jù)庫和網(wǎng)絡(luò)服務(wù)又使得使用者面臨被數(shù)據(jù)淹沒的危險,此外如何有效的組織和利用這些信息也成為生物信息研究的重點(diǎn)。為了構(gòu)建一個統(tǒng)一的生物信息框架來有效的統(tǒng)一和組織以及分析這些不同來源、類型的數(shù)據(jù)和信息,我們對生物信息的數(shù)據(jù)結(jié)構(gòu)和信息構(gòu)成進(jìn)行了基礎(chǔ)的分析。在對原始數(shù)據(jù)處理的基礎(chǔ)上,本研究設(shè)計了以概念為節(jié)點(diǎn),以關(guān)系為連線的數(shù)據(jù)框架。對海量生命科學(xué)概念構(gòu)建統(tǒng)一的本體庫,構(gòu)建了新的基于語義的文獻(xiàn)搜索引擎。我們還開發(fā)了一套新的網(wǎng)絡(luò)分析算法,結(jié)合我們標(biāo)準(zhǔn)化后的信息分值,我們可以快速的計算并排序最相關(guān)的概念和可能的信息通路,最終提供可能的生物學(xué)解釋。在進(jìn)行的基礎(chǔ)研究和數(shù)據(jù)處理基礎(chǔ)上,我們開發(fā)了名為BioPubInfo(http://www.biopubinfo.org)的生命科學(xué)知識引擎,包含文獻(xiàn)相關(guān)搜索引擎和網(wǎng)絡(luò)知識分析引擎。目前網(wǎng)絡(luò)知識分析引擎已初步完成了界面的開發(fā)和后臺的設(shè)置,文獻(xiàn)相關(guān)搜索引擎還在進(jìn)一步完善中。在對生命科學(xué)海量數(shù)據(jù)的分析處理過程中,我們設(shè)計和摸索出了一套分析和處理海量數(shù)據(jù),并利用數(shù)據(jù)的網(wǎng)絡(luò)結(jié)構(gòu)搜索和預(yù)測新知識的算法。新的算法在充分利用圖形數(shù)據(jù)庫與圖形數(shù)據(jù)結(jié)構(gòu)框架優(yōu)勢的基礎(chǔ)上實現(xiàn)了對億級數(shù)量概念關(guān)系網(wǎng)絡(luò)的實時分析,并在此基礎(chǔ)上對人類疾病和擬南芥、水稻相關(guān)性狀的候選基因進(jìn)行了預(yù)測;讷@得概念網(wǎng)絡(luò)及其理念,我們對水稻的表型與基因的關(guān)系進(jìn)行了預(yù)測,并整合其他信息建立了QTXtoGene的分析平臺,后續(xù)將加入更多的物種和性狀。在對全局?jǐn)?shù)據(jù)整合的過程中,我們還分析了擬南芥的鹽脅迫表達(dá)調(diào)控網(wǎng)絡(luò)以及基因組進(jìn)化和水平轉(zhuǎn)移等幾個方面的問題。構(gòu)建了擬南芥根部在鹽脅迫下不同時間的表達(dá)調(diào)控網(wǎng)絡(luò),采用了新的水平基因檢測方法,分析并找到了家蠶基因組中10個水平轉(zhuǎn)移基因。同時將共有信息的方法用于分析流感病毒受體蛋白不同位點(diǎn)之間的關(guān)系網(wǎng)絡(luò)。
[Abstract]:With the production of high-throughput data analysis data, biological information database and system biology are becoming more and more important in life science research.A large number of database and network services make users face the risk of being flooded by data. In addition, how to organize and utilize this information effectively has become the focus of biological information research.In order to construct a unified biological information framework to effectively unify and organize and analyze these different sources, types of data and information, we analyze the data structure and information structure of biological information.On the basis of raw data processing, a data frame based on concept and relation is designed.This paper constructs a unified ontology library for mass life science concepts, and constructs a new semantic based literature search engine.We have also developed a new network analysis algorithm. Combined with our standardized information scores, we can quickly calculate and sort the most relevant concepts and possible information pathways, and ultimately provide possible biological explanations.On the basis of basic research and data processing, we have developed a life science knowledge engine called BioPubInfoN http: / / www.biopubinfo.org, which includes literature related search engines and web knowledge analysis engines.At present, the network knowledge analysis engine has initially completed the development of the interface and background settings, literature related search engines are still in the process of further improvement.In the process of analyzing and processing massive data in life sciences, we design and explore a set of algorithms for analyzing and processing massive data, and using the network structure of data to search and predict new knowledge.On the basis of taking full advantage of the advantages of graphic database and graphic data structure framework, the new algorithm realizes the real-time analysis of the concept relation network of billion quantity, and on the basis of this, it can analyze human diseases and Arabidopsis thaliana.Candidate genes for rice associated traits were predicted.Based on the concept network and its concept, we predict the relationship between phenotypes and genes of rice, and integrate other information to establish a QTXtoGene analysis platform. More species and traits will be added in the future.In the process of global data integration, we also analyzed several aspects of Arabidopsis thaliana, such as salt stress expression regulatory network, genome evolution and horizontal transfer.The regulation network of Arabidopsis thaliana root expression at different time under salt stress was constructed. A new horizontal gene detection method was used to analyze and find 10 horizontal transfer genes in Bombyx mori genome.At the same time, the common information method is used to analyze the relationship between different sites of influenza virus receptor protein.
【學(xué)位授予單位】:浙江大學(xué)
【學(xué)位級別】:博士
【學(xué)位授予年份】:2012
【分類號】:TP391.3;Q811.4
,
本文編號:1758782
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1758782.html
最近更新
教材專著