基于內(nèi)存計(jì)算的基因型—表型關(guān)聯(lián)技術(shù)研究
發(fā)布時(shí)間:2018-02-28 23:28
本文關(guān)鍵詞: 疾病表型 致病基因 優(yōu)先級(jí) TrustRank 大數(shù)據(jù) 出處:《哈爾濱工業(yè)大學(xué)》2017年碩士論文 論文類型:學(xué)位論文
【摘要】:伴隨生物醫(yī)學(xué)數(shù)據(jù)得到爆炸式增長(zhǎng),快速發(fā)展的生物信息學(xué)也在不斷剖析這些數(shù)據(jù)背后隱藏的信息,相關(guān)研究已成為熱點(diǎn)。識(shí)別致病基因是人類健康研究的根本挑戰(zhàn),針對(duì)識(shí)別致病基因就要通過(guò)生物網(wǎng)絡(luò)了解基因型與疾病表型的關(guān)聯(lián)關(guān)系。海量生物數(shù)據(jù)存儲(chǔ)在各種沒(méi)有統(tǒng)一標(biāo)準(zhǔn)化的數(shù)據(jù)庫(kù)中,生物網(wǎng)絡(luò)都是以這些數(shù)據(jù)為基礎(chǔ)構(gòu)建起來(lái),而且研究生物網(wǎng)絡(luò)也是在對(duì)探索復(fù)雜生命活動(dòng)。疾病表型與基因型的關(guān)聯(lián)關(guān)系對(duì)于致病基因的預(yù)測(cè)和尋找基因?qū)е碌募膊《季哂猩钸h(yuǎn)意義。根據(jù)疾病的模塊性表明,功能相關(guān)的蛋白質(zhì)會(huì)導(dǎo)致相似疾病。由此,研究疾病基因關(guān)聯(lián)方法大多集中于基于計(jì)算網(wǎng)絡(luò),整合了蛋白質(zhì)相互作用網(wǎng)絡(luò)、疾病表型相似性網(wǎng)絡(luò)和疾病-基因二分網(wǎng)絡(luò)。在線孟德爾遺傳(OMIM)是人類遺傳疾病和相關(guān)基因的數(shù)據(jù)庫(kù),基于OMIM數(shù)據(jù)我們計(jì)算形成了疾病表型相似性網(wǎng)絡(luò)和疾病基因?qū)?yīng)網(wǎng)絡(luò),再加上蛋白質(zhì)相互作用網(wǎng)絡(luò),整合構(gòu)建復(fù)雜的異構(gòu)網(wǎng)絡(luò)。本文介紹了相關(guān)的重啟游走算法,通過(guò)改進(jìn)網(wǎng)頁(yè)排序算法Trust Rank后形成YSearch方法。算法首先根據(jù)構(gòu)建網(wǎng)絡(luò)選擇查詢疾病(基因)的先驗(yàn)知識(shí)(種子集),通過(guò)全局網(wǎng)絡(luò)的隨機(jī)游走策略迭代處理得到TR分?jǐn)?shù),然后對(duì)候選基因與疾病進(jìn)行優(yōu)先級(jí)排序,實(shí)現(xiàn)預(yù)測(cè)功能。并且針對(duì)算法效果進(jìn)行留一交叉驗(yàn)證,采用ROC曲線與其他方法比較實(shí)驗(yàn)結(jié)果,證明算法的良好性能。以此,我們?cè)O(shè)計(jì)并開發(fā)了基因疾病的搜索引擎平臺(tái)YSearch,整個(gè)系統(tǒng)是搭建在基于內(nèi)存計(jì)算的spark大數(shù)據(jù)平臺(tái),數(shù)據(jù)存儲(chǔ)在HBase中,并對(duì)系統(tǒng)進(jìn)行相關(guān)介紹與優(yōu)化。本文的算法與平臺(tái)都可以對(duì)疾病診斷與治療等臨床研究提供新思路。
[Abstract]:With the explosive growth of biomedical data, the rapid development of bioinformatics is also analyzing the hidden information behind these data. The related research has become a hot spot. Identification of pathogenic genes is a fundamental challenge in human health research. In order to identify pathogenic genes, we need to understand the relationship between genotypes and disease phenotypes through biological networks. Massive biological data are stored in a variety of databases that are not standardized, and biological networks are built on the basis of these data. Moreover, the study of biological networks is also useful in exploring complex life activities. The association between disease phenotypes and genotypes is of great significance for the prediction of pathogenic genes and the search for diseases caused by genes. Functionally related proteins can lead to similar diseases. Therefore, most of the methods of studying disease gene association are based on computational networks and integrate protein interaction networks. Online Mendelian genetic network is a database of human genetic diseases and related genes. Based on OMIM data, we calculate the disease phenotypic similarity network and disease gene corresponding network. In addition, protein interaction networks are integrated to construct complex heterogeneous networks. The YSearch method is formed by improving the Trust Rank algorithm. Firstly, the algorithm selects a priori knowledge (seed set) to query the disease (gene) according to the construction of the network, and obtains the tr score by iterating the random walk strategy of the global network. Then the candidate genes and diseases are prioritized to achieve the function of prediction, and a cross-validation of the effectiveness of the algorithm is carried out. The experimental results are compared with other methods by using the ROC curve, and the good performance of the algorithm is proved. We have designed and developed the search engine platform YSearch. the whole system is built on the spark big data platform based on memory computing, and the data is stored in HBase. The algorithm and platform of this paper can provide new ideas for clinical research such as disease diagnosis and treatment.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:R3416;TP311.13
【參考文獻(xiàn)】
相關(guān)期刊論文 前1條
1 袁芳;李靖;;基于功能相似性預(yù)測(cè)疾病基因[J];計(jì)算機(jī)應(yīng)用研究;2012年11期
相關(guān)博士學(xué)位論文 前3條
1 梁媚媚;基因網(wǎng)絡(luò)信息搜索引擎的構(gòu)建、優(yōu)化與應(yīng)用[D];浙江大學(xué);2015年
2 程亮;基于本體的疾病數(shù)據(jù)整合與挖掘方法研究[D];哈爾濱工業(yè)大學(xué);2014年
3 陳文海;關(guān)于基因型—表型相關(guān)問(wèn)題的統(tǒng)計(jì)遺傳學(xué)及計(jì)算生物學(xué)分析[D];復(fù)旦大學(xué);2014年
相關(guān)碩士學(xué)位論文 前2條
1 邵海珠;基于協(xié)同過(guò)濾的疾病基因預(yù)測(cè)方法[D];西安電子科技大學(xué);2014年
2 雋立然;基于生物醫(yī)學(xué)本體的生物信息數(shù)據(jù)庫(kù)集成方法研究[D];哈爾濱工業(yè)大學(xué);2009年
,本文編號(hào):1549457
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1549457.html
最近更新
教材專著