天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

相互作用組異構(gòu)數(shù)據(jù)集成研究

發(fā)布時(shí)間:2018-04-24 19:07

  本文選題:數(shù)據(jù)集成 + 異構(gòu)數(shù)據(jù)庫系統(tǒng); 參考:《北京協(xié)和醫(yī)學(xué)院》2011年博士論文


【摘要】:后基因組(post-genome)生物醫(yī)學(xué)的一個(gè)關(guān)鍵目標(biāo)就是對(duì)活細(xì)胞內(nèi)的所有分子及其相互間的作用進(jìn)行全面和系統(tǒng)地研究。理解細(xì)胞系統(tǒng)的一個(gè)關(guān)鍵步驟是對(duì)DNA、RNA、蛋白質(zhì)和化學(xué)小分子等相關(guān)的物理相互作用網(wǎng)絡(luò)進(jìn)行映射,從而對(duì)特定的物種形成一個(gè)盡可能完整和準(zhǔn)確的相互作用組網(wǎng)絡(luò)(interactome network)。研究者們采用高通量技術(shù)的實(shí)驗(yàn),基于計(jì)算的預(yù)測(cè),以及文獻(xiàn)挖掘等方法得到了大量的、有價(jià)值的相互作用組數(shù)據(jù)。同時(shí),為了管理和利用這些數(shù)據(jù),研究者們建立了許多相互作用組數(shù)據(jù)庫。但是,現(xiàn)有的相互作用組數(shù)據(jù)庫相互隔離,形成了所謂的“信息孤島”,不能實(shí)現(xiàn)數(shù)據(jù)共享(data sharing)和更有效的利用。為了更好地管理和更有效地利用現(xiàn)有的相互作用組數(shù)據(jù),需要將這些相互獨(dú)立的數(shù)據(jù)庫有機(jī)地集成在一起。這對(duì)于增加相互作用組研究的整體知識(shí)水平,以及對(duì)該領(lǐng)域更深入、更全面的理解是十分重要的。數(shù)據(jù)集成(data integration)已經(jīng)成為相互作用組研究的重要方向之一。 本研究建立了相互作用組數(shù)據(jù)倉庫InteractomeDW。InteractomeDW包括相互作用組數(shù)據(jù)庫集合,生物實(shí)體映射數(shù)據(jù)庫,生物學(xué)本體和受控詞表數(shù)據(jù)庫集合,以及生物學(xué)注釋數(shù)據(jù)庫等四大部分。InteractomeDW存儲(chǔ)了62779056條相互作用記錄,涉及51個(gè)相互作用組數(shù)據(jù)源,9個(gè)輔助數(shù)據(jù)源,5個(gè)相互作用組數(shù)據(jù)類型(蛋白質(zhì)相互作用,結(jié)構(gòu)域相互作用,分子間相互作用,復(fù)合物和通路),2426個(gè)物種,170個(gè)相互作用鑒定方法,44個(gè)相互作用類型,以及85212篇參考文獻(xiàn)。就我們所知,InteractomeDW比現(xiàn)有相關(guān)研究建立的數(shù)據(jù)倉庫的規(guī)模都要大。 本研究首次提出融合了基于數(shù)據(jù)倉庫(data warehouse)和基于中介(mediation)這兩種方法的新型異構(gòu)數(shù)據(jù)集成方法WM。WM方法采用數(shù)據(jù)倉庫方式進(jìn)行數(shù)據(jù)管理,以確保數(shù)據(jù)源的可用性、提高系統(tǒng)查詢效率和數(shù)據(jù)質(zhì)量。待集成的所有相互作用組數(shù)據(jù)庫都存儲(chǔ)在本地服務(wù)器上,這樣可以最大限度地確保數(shù)據(jù)源的可用性。同時(shí),本地存儲(chǔ)策略顯著提高了系統(tǒng)的查詢效率和響應(yīng)能力。相互作用組數(shù)據(jù)倉庫提供的數(shù)據(jù)清洗功能可以檢測(cè)、修正或刪除所有相互作用組數(shù)據(jù)庫中已損壞、不完整或不準(zhǔn)確的臟數(shù)據(jù),進(jìn)而提高所集成數(shù)據(jù)的質(zhì)量。WM方法采用中介方式實(shí)現(xiàn)具體的數(shù)據(jù)集成工作,以提高系統(tǒng)的擴(kuò)展性和可維護(hù)性。在WM方法中,可以方便地通過向中介器模塊的映射關(guān)系表注冊(cè)新的數(shù)據(jù)源,并構(gòu)建相應(yīng)包裝器的方式實(shí)現(xiàn)數(shù)據(jù)集成范圍的擴(kuò)展。這種擴(kuò)展方式對(duì)數(shù)據(jù)集成系統(tǒng)的其他部分沒有任何影響,實(shí)現(xiàn)了可插拔式的數(shù)據(jù)集成。這種低耦合度、靈活的集成方式使得數(shù)據(jù)集成系統(tǒng)的可維護(hù)性大大加強(qiáng)。WM方法結(jié)合了上述兩種數(shù)據(jù)集成方法的優(yōu)點(diǎn),很好地兼顧了數(shù)據(jù)集成的效率和靈活性,為相互作用組數(shù)據(jù)集成提供了基礎(chǔ)架構(gòu)和解決方案。 本研究利用WM方法成功地構(gòu)建了一個(gè)可靠性高、數(shù)據(jù)質(zhì)量高、查詢效率高和可擴(kuò)展性強(qiáng)的基于網(wǎng)絡(luò)的相互作用組異構(gòu)數(shù)據(jù)集成系統(tǒng)IMbase。建立IMbase的目的就是讓生物學(xué)家可以透明地訪問相互作用組異構(gòu)數(shù)據(jù)庫,更有效地利用其中的數(shù)據(jù)。IMbase是一個(gè)共享和利用相互作用組數(shù)據(jù)的基礎(chǔ)平臺(tái),為生物學(xué)家提供了相互作用組數(shù)據(jù)集成、相互作用網(wǎng)絡(luò)分析和推理,以及相應(yīng)的Web Service開發(fā)接口等多種功能,進(jìn)而可以幫助生物學(xué)家生成相互作用假說和實(shí)現(xiàn)知識(shí)發(fā)現(xiàn)(knowledge discovery)。IMbase對(duì)相互作用組相關(guān)數(shù)據(jù)進(jìn)行了垂直集成。這樣做可以通過及時(shí)總結(jié)和整理現(xiàn)有數(shù)據(jù),實(shí)現(xiàn)相互作用組研究領(lǐng)域內(nèi)更廣泛的數(shù)據(jù)共享,進(jìn)而提高相互作用組研究領(lǐng)域的總體知識(shí)水平。以相互作用組數(shù)據(jù)的垂直集成為基礎(chǔ),可以進(jìn)一步實(shí)現(xiàn)跨領(lǐng)域和學(xué)科數(shù)據(jù)的水平集成,以實(shí)現(xiàn)更有價(jià)值的知識(shí)發(fā)現(xiàn)。就我們所知,IMbase是現(xiàn)有數(shù)據(jù)規(guī)模最大,功能最為完善的相互作用組數(shù)據(jù)集成系統(tǒng)。用戶可以通過網(wǎng)址http://122.70.220.98/imbase/index.gr免費(fèi)訪問IMbase。 本研究將IMbase系統(tǒng)應(yīng)用于小鼠神經(jīng)管缺陷(NTDs)的研究。以表達(dá)譜芯片篩選出的差異表達(dá)基因?yàn)檎T餌,利用IMbase獲得與這些差異表達(dá)基囚有相互作用的生物實(shí)體對(duì)應(yīng)的基因,并構(gòu)建相應(yīng)的相互作用網(wǎng)絡(luò)。本研究建立了已知小鼠NTDs候選基因數(shù)據(jù)庫MouseNTDs。通過MouseNTDs數(shù)據(jù)庫對(duì)潛在NTDs候選基因進(jìn)行篩選,以確定已被認(rèn)定和尚未被認(rèn)定為小鼠NTDs候選基因的潛在NTDs候選基因。最后,通過研究這些篩選出的潛在NTDs候選基因的注釋信息和通路信息,本研究提出了小鼠NTDs候選基因假說,為進(jìn)一步的分子生物學(xué)實(shí)驗(yàn)提供可能的方向。 本研究的主要?jiǎng)?chuàng)新之處在于: 1.提出了一種新的異構(gòu)數(shù)據(jù)集成的方法WM。WM方法結(jié)合了基于數(shù)據(jù)倉庫和基于中介這兩種數(shù)據(jù)集成方法的優(yōu)點(diǎn),很好地兼顧了數(shù)據(jù)集成的效率和靈活性,為相互作用組異構(gòu)數(shù)據(jù)集成提供了基礎(chǔ)架構(gòu)和解決方案。 2.建立了一個(gè)相互作用組數(shù)據(jù)倉庫InteractomeDW。InteractomeDW共存儲(chǔ)了超過62百萬(62 779 056)條相互作用記錄,涉及51個(gè)相互作用組數(shù)據(jù)源,9個(gè)輔助數(shù)據(jù)源,5個(gè)相互作用組數(shù)據(jù)類型(蛋白質(zhì)相互作用,結(jié)構(gòu)域相互作用,分子間相互作用,復(fù)合物和通路),2 426個(gè)物種,170個(gè)相互作用鑒定方法,44個(gè)相互作用類型,以及85212篇參考文獻(xiàn)。 3.建立了一個(gè)生物實(shí)體映射數(shù)據(jù)庫BEM。BEM是由5個(gè)相關(guān)數(shù)據(jù)源集成而來,共存儲(chǔ)了超過1.8億(180 328 282)條非冗余的映射記錄,涉及4個(gè)實(shí)體類型(基因,蛋白質(zhì),小分子物質(zhì)和化合物),可以實(shí)現(xiàn)90個(gè)常用生物醫(yī)學(xué)數(shù)據(jù)庫之間的實(shí)體映射。 4.利用WM方法,構(gòu)建了一個(gè)基于網(wǎng)絡(luò)的相互作用組異構(gòu)數(shù)據(jù)集成系統(tǒng)IMbase。IMbase是一個(gè)共享和利用相互作用組數(shù)據(jù)的計(jì)算平臺(tái),提供相互作用組數(shù)據(jù)集成、相互作用網(wǎng)絡(luò)分析和推理、生物實(shí)體映射等多種服務(wù),可以幫助研究者生成相互作用假說和實(shí)現(xiàn)知識(shí)發(fā)現(xiàn)。 5.構(gòu)建的異構(gòu)數(shù)據(jù)集成系統(tǒng)IMbase不但提供了基于網(wǎng)絡(luò)應(yīng)用程序的訪問方式,而且還提供了基于Web Service的訪問方式,以便為相關(guān)軟件開發(fā)者提供編程接口,實(shí)現(xiàn)軟件復(fù)用和可互操作性。 6.將異構(gòu)數(shù)據(jù)集成系統(tǒng)IMbase用于小鼠神經(jīng)管缺陷(NTDs)的研究,通過構(gòu)建和分析潛在的小鼠NTDs候選基因相關(guān)的相互作用網(wǎng)絡(luò),提出小鼠NTDs候選基因的假說,為進(jìn)一步的分子生物學(xué)實(shí)驗(yàn)提供參考方向。
[Abstract]:One of the key objectives of post - genome biomedical research is to conduct a comprehensive and systematic study of all the molecules in living cells and their interactions . A key step in understanding cellular systems is to map DNA , RNA , proteins , and chemical small molecules and other related physical interaction networks to form an interactome network that is as complete and accurate as possible for specific species . At the same time , in order to manage and utilize these data , the researchers have established many database of interaction groups . However , in order to manage and utilize these data , the researchers have established many database of interaction groups . However , in order to better manage and utilize the existing interaction group data , it is important to integrate these mutually independent databases .


InteractomeDW has established an interaction group data warehouse , InteractomeDW . InteractomeDW includes four parts : an interaction group database set , a biological entity mapping database , a biological ontology , a controlled vocabulary database collection , and a biological annotation database . The InteractomeDW stores 62779056 interaction records , involving 51 interacting group data sources , 9 auxiliary data sources , 5 interacting group data types ( protein interaction , domain interaction , intermolecular interaction , complexes and pathways ) , 2426 species , 170 interaction identification methods , 44 interaction types , and 85212 references . As far as we know , the scale of the data warehouse established by InteractomeDW is greater than that of existing related research .


This paper first puts forward a new heterogeneous data integration method WM based on data warehouse and intermediary . The WM method adopts data warehouse to manage data to ensure the availability of data source , improve system query efficiency and data quality .


This study successfully constructed IMbase of heterogeneous data integration system based on network with high reliability , high data quality , high query efficiency and expansibility by WM method . The purpose of establishing IMbase is to enable biologists to access the heterogeneous database of the interaction group transparently and effectively utilize the data . The IMbase is a base platform for sharing and utilizing the interaction group data , and provides the biologists with various functions such as interaction group data integration , interaction network analysis and reasoning , and corresponding development interface of Web Service , etc . , which can help biologists generate interactive hypothesis and knowledge discovery . IMbase is vertically integrated with the data related to the interaction group . In this way , more extensive data sharing in the field of interaction group research can be realized by summarizing and arranging the existing data in a timely manner . It can further realize the horizontal integration of the cross - domain and subject data to realize more valuable knowledge discovery . As far as we know , IMbase is the most powerful and perfect interaction group data integration system of the existing data .


In this study , the IMbase system was applied to the study of mouse neural tube defects ( NTDs ) .


The main innovations of this study are :


1 . A new method WM for heterogeneous data integration is put forward . The WM method combines the advantages of two kinds of data integration methods based on data warehouse and intermediary . It combines the efficiency and flexibility of data integration , and provides the infrastructure and solution for the integration of heterogeneous data in the interaction group .


2 . An interaction group data warehouse , InteractomeDW . InteractomeDW , was established to store more than 62 million ( 62,776,056 ) interaction records , involving 51 interacting group data sources , 9 auxiliary data sources , 5 interacting group data types ( protein interactions , domain interactions , intermolecular interactions , complexes and pathways ) , 2,426 species , 170 interaction identification methods , 44 interaction types , and 85212 references .


3 . A biological entity mapping database ( BEM ) is established . The BEM is integrated with five related data sources . It has stored more than 180 million ( 180 328 282 ) non - redundant mapping records , involving 4 entity types ( genes , proteins , small molecule substances and compounds ) , and can realize the entity mapping between 90 common biomedical databases .


4 . Based on the WM method , the IMbase of a heterogeneous data integration system based on the network is constructed . The IMbase is a computing platform for sharing and utilizing the interaction group data , which provides a variety of services such as interaction group data integration , interaction network analysis and reasoning , biological entity mapping and the like , and can help the investigator to generate the interaction hypothesis and realize the knowledge discovery .


5 . The constructed heterogeneous data integration system IMbase not only provides access method based on web application , but also provides access method based on Web Service , so as to provide programming interface for relevant software developers , so as to realize software reuse and interoperability .


6 . Using IMbase of heterogeneous data integration system ( IMbase ) in the study of mouse neural tube defect ( NTDs ) , by constructing and analyzing the potential mouse NTDs candidate gene related interaction network , the hypothesis of mouse NTDs candidate gene was put forward , and the reference direction was provided for further molecular biology experiments .

【學(xué)位授予單位】:北京協(xié)和醫(yī)學(xué)院
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2011
【分類號(hào)】:R346

【共引文獻(xiàn)】

相關(guān)期刊論文 前1條

1 謝曉蘭;何恭賀;周德儉;;運(yùn)用中間件技術(shù)的制造網(wǎng)格數(shù)據(jù)資源集成系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[J];現(xiàn)代制造工程;2011年04期

,

本文編號(hào):1797855

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/xiyixuelunwen/1797855.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶0b83a***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com
大香蕉伊人精品在线观看| 亚洲一级二级三级精品| 欧美一区二区三区喷汁尤物| 99久免费精品视频在线观| 久久福利视频在线观看| 欧美色欧美亚洲日在线| 午夜福利国产精品不卡| 国产一区在线免费国产一区| 在线视频免费看你懂的| 国产精品日韩欧美一区二区| 青青草草免费在线视频| 东京干男人都知道的天堂| 大尺度激情福利视频在线观看| 91亚洲国产成人久久| 亚洲欧美精品伊人久久| 99一级特黄色性生活片| 亚洲中文字幕熟女丝袜久久| 欧美日韩国产福利在线观看| 日韩三级黄色大片免费观看| 久久精品一区二区少妇| 亚洲av又爽又色又色| 免费精品国产日韩热久久| 男女一进一出午夜视频| 国产又色又爽又黄的精品视频| 日韩国产传媒在线精品| 99秋霞在线观看视频| 欧美日本精品视频在线观看| 中文字幕亚洲视频一区二区| 一区二区三区在线不卡免费| 国产精品刮毛视频不卡| 亚洲国产成人一区二区在线观看| 福利在线午夜绝顶三级| 亚洲妇女作爱一区二区三区| 午夜精品一区二区av| 中文字幕精品一区二区年下载| 国产欧美日产久久婷婷| 初尝人妻少妇中文字幕在线| 中文字幕有码视频熟女| 国产精品超碰在线观看| 国产熟女一区二区三区四区| 精品推荐国产麻豆剧传媒|