基于Python的基因表達(dá)數(shù)據(jù)網(wǎng)絡(luò)爬蟲研究與設(shè)計(jì)

發(fā)布時(shí)間：2018-05-18 04:30

本文選題：GEO數(shù)據(jù)庫 + 網(wǎng)絡(luò)爬蟲��；參考：《山西醫(yī)科大學(xué)》2017年碩士論文

【摘要】：目的:以NCBI創(chuàng)建的開放式基因表達(dá)綜合數(shù)據(jù)庫(Gene Expression Omnibu,GEO)為例,開發(fā)爬蟲程序可以有效的解決日益增長的高通量基因表達(dá)的實(shí)驗(yàn)數(shù)據(jù)帶來的問題。對(duì)信息進(jìn)行挖掘和處理,而不被海量信息所淹沒,提高數(shù)據(jù)庫的利用率;減少生物醫(yī)學(xué)信息資源的浪費(fèi),為醫(yī)學(xué)工作者供給全面的基因表達(dá)數(shù)據(jù)信息,推動(dòng)臨床生物信息學(xué)的發(fā)展。方法:1.文獻(xiàn)分析法:查閱網(wǎng)絡(luò)爬蟲系統(tǒng)、網(wǎng)頁抓取技術(shù)、GEO數(shù)據(jù)庫方面的相關(guān)文獻(xiàn)等,深入學(xué)習(xí)了解網(wǎng)絡(luò)爬蟲系統(tǒng)發(fā)展現(xiàn)狀,網(wǎng)頁抓取技術(shù)的策略和GEO數(shù)據(jù)庫發(fā)展現(xiàn)狀。為開發(fā)設(shè)計(jì)專門適用于GEO數(shù)據(jù)庫中RNA相關(guān)數(shù)據(jù)抓取的網(wǎng)絡(luò)爬蟲系統(tǒng)提供理論參考和實(shí)踐經(jīng)驗(yàn)。2.編程語言:利用Python語言編寫爬蟲程序。3.數(shù)據(jù)庫技術(shù);使用MySQL數(shù)據(jù)庫技術(shù)儲(chǔ)存爬蟲程序爬取到的基因表達(dá)數(shù)據(jù)。結(jié)果:1.本研究成功開發(fā)一款爬蟲程序,爬蟲程序投入運(yùn)行;2.爬蟲程序抓取GEO數(shù)據(jù)庫中全部基因表達(dá)數(shù)據(jù)共71032個(gè),并保存在Mysql數(shù)據(jù)庫中。結(jié)論:爬蟲程序?qū)崿F(xiàn)GEO數(shù)據(jù)庫中基因表達(dá)信息相關(guān)數(shù)據(jù)的自動(dòng)抓取,免去人工下載的繁瑣,有效的實(shí)現(xiàn)數(shù)據(jù)的大規(guī)模下載。高效地從數(shù)據(jù)庫的海量信息中挖掘出有效的信息或者生物知識(shí),幫助臨床研究者瀏覽生物醫(yī)學(xué)文獻(xiàn),允許數(shù)據(jù)資源的批量下載,很大程度上方便生物研究與信息的查詢與借鑒。其抓取到的成果不僅對(duì)基礎(chǔ)醫(yī)學(xué)研究有極大推動(dòng)作用,而且對(duì)人類疾病防治,基因定位等都具有重要意義。
[Abstract]:Aim: to develop an open gene expression database, Gene Expression Omnibun GE O, created by NCBI, and to develop a reptile program to effectively solve the problems caused by the increasing experimental data of high throughput gene expression. To mine and process the information without being submerged by the massive information, to improve the utilization of the database, to reduce the waste of biomedical information resources, and to provide comprehensive gene expression data information for medical workers. To promote the development of clinical bioinformatics. Method 1: 1. Literature analysis: referring to web crawler system, web crawling technology and related documents of geo database, and studying deeply the current situation of web crawler system, the strategy of web crawler technology and the development status of GEO database. It provides a theoretical reference and practical experience for the development and design of a web crawler system that can be used to capture RNA related data in GEO database. Programming language: using Python language to write crawler program. 3. Database technology; the use of MySQL database technology to store crawler crawling gene expression data. The result is 1: 1. In this study, a reptile program was successfully developed, and the crawler program was put into operation. A total of 71032 gene expression data were captured from GEO database by crawler program and stored in Mysql database. Conclusion: the crawler program can automatically capture the data related to gene expression information in GEO database, and can effectively realize the large-scale data download without the tedious manual download. Efficient mining of effective information or biological knowledge from the massive information in the database helps clinical researchers browse biomedical literature and allow batch downloading of data resources, which greatly facilitates the inquiry and reference of biological research and information. The results not only promote the research of basic medicine, but also play an important role in the prevention and treatment of human diseases and gene location.
【學(xué)位授予單位】：山西醫(yī)科大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2017
【分類號(hào)】：Q811.4

【參考文獻(xiàn)】

相關(guān)期刊論文前4條

1 羅林波;陳綺;吳清秀;;基于Shark-Search和Hits算法的主題爬蟲研究[J];計(jì)算機(jī)技術(shù)與發(fā)展;2010年11期

2 張翔;周明全;李智杰;董麗麗;;基于PageRank與Bagging的主題爬蟲研究[J];計(jì)算機(jī)工程與設(shè)計(jì);2010年14期

3 熊筱晶;;GEO基因表達(dá)數(shù)據(jù)庫中芯片技術(shù)平臺(tái)的統(tǒng)計(jì)分析[J];生命的化學(xué);2009年01期

4 羅霄,任勇,山秀明;基于Python的混合語言編程及其實(shí)現(xiàn)[J];計(jì)算機(jī)應(yīng)用與軟件;2004年12期

相關(guān)博士學(xué)位論文前2條

1 韋博;基于芯片數(shù)據(jù)和文本挖掘的膠質(zhì)瘤生物信息學(xué)分析[D];吉林大學(xué);2015年

2 陳新美;基于基因表達(dá)譜的肝纖維化治療藥物篩選及相關(guān)實(shí)驗(yàn)研究[D];南方醫(yī)科大學(xué);2011年

相關(guān)碩士學(xué)位論文前10條

1 于懷寶;面向建材信息的網(wǎng)絡(luò)爬蟲系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[D];北京交通大學(xué);2015年

2 耿大偉;基于Python技術(shù)的校園網(wǎng)搜索引擎的設(shè)計(jì)與實(shí)現(xiàn)[D];燕山大學(xué);2015年

3 胡忠?guī)?基于Python的企業(yè)安全漏洞管理方法研究[D];北京郵電大學(xué);2015年

4 李勃;基于R語言的DNA微陣列數(shù)據(jù)分析與挖掘平臺(tái)的構(gòu)建[D];重慶大學(xué);2013年

5 馬慧;面向特定網(wǎng)頁的Web爬蟲的設(shè)計(jì)與實(shí)現(xiàn)[D];吉林大學(xué);2012年

6 王洪威;主題網(wǎng)絡(luò)爬蟲的分析與設(shè)計(jì)[D];北京郵電大學(xué);2013年

7 劉晶晶;面向微博的網(wǎng)絡(luò)爬蟲研究與實(shí)現(xiàn)[D];復(fù)旦大學(xué);2012年

8 郝以珍;基于頁面分析的網(wǎng)絡(luò)爬蟲系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[D];華中科技大學(xué);2012年

9 鄭博文;基于Hadoop的分布式網(wǎng)絡(luò)爬蟲技術(shù)[D];哈爾濱工業(yè)大學(xué);2011年

10 梁萍;搜索引擎中網(wǎng)絡(luò)爬蟲及結(jié)果聚類的研究與實(shí)現(xiàn)[D];中國科學(xué)技術(shù)大學(xué);2011年

，

本文編號(hào)：1904390

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/jiyingongcheng/1904390.html

上一篇：色氨酸羥化酶基因與青年人自殺未遂的關(guān)聯(lián)研究
下一篇：Kigamicin生物合成基因簇中orf48和orf49的功能研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級(jí)|國家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于Python的基因表達(dá)數(shù)據(jù)網(wǎng)絡(luò)爬蟲研究與設(shè)計(jì)