基于科研文獻(xiàn)挖掘的疾病與藥物關(guān)聯(lián)研究
本文選題:科研文獻(xiàn)挖掘 切入點(diǎn):生物醫(yī)學(xué)實(shí)體 出處:《山西醫(yī)科大學(xué)》2017年碩士論文
【摘要】:目的:生物醫(yī)學(xué)實(shí)體是包含在醫(yī)學(xué)科研文獻(xiàn)中出現(xiàn)的疾病、藥物、基因等名稱(chēng)、術(shù)語(yǔ)或概念,即一種文獻(xiàn)內(nèi)含知識(shí),了解其相互關(guān)聯(lián)對(duì)于科學(xué)研究意義重大。然此類(lèi)知識(shí)被大量淹沒(méi)于文獻(xiàn)海洋,亟需一種有效知識(shí)管理方式將之快速地展現(xiàn)給科研人員。鑒于此,本研究擬基于科研文獻(xiàn)挖掘開(kāi)展疾病與藥物實(shí)體關(guān)聯(lián)研究。方法:1.文獻(xiàn)分析法通過(guò)搜集、鑒別、整理相關(guān)文獻(xiàn),分析當(dāng)前國(guó)內(nèi)外相關(guān)研究歷史、現(xiàn)狀及存在的問(wèn)題。在閱讀、整理、歸納、分析這些文獻(xiàn)材料的基礎(chǔ)上,借鑒他人的研究成果,從而形成自己的研究框架。2.編程語(yǔ)言和數(shù)據(jù)庫(kù)技術(shù)利用Java和Python等程序設(shè)計(jì)語(yǔ)言和數(shù)據(jù)庫(kù)技術(shù)將上述PubMed ID相關(guān)的文獻(xiàn)信息下載,進(jìn)行批量數(shù)據(jù)有序化,并在MySQL數(shù)據(jù)庫(kù)中分別建庫(kù)。3.生物實(shí)體識(shí)別方法使用基于詞典的匹配方法識(shí)別疾病實(shí)體與藥物實(shí)體。4.信息計(jì)量學(xué)方法利用Python自編程序語(yǔ)言,基于信息計(jì)量學(xué)中的共現(xiàn)關(guān)系構(gòu)建疾病與藥物實(shí)體共現(xiàn)網(wǎng)絡(luò),并運(yùn)用詞頻分析和共詞分析進(jìn)行疾病與藥物實(shí)體關(guān)聯(lián)分析。5.社會(huì)網(wǎng)絡(luò)分析方法利用社會(huì)網(wǎng)絡(luò)分析工具Pajek對(duì)共現(xiàn)網(wǎng)絡(luò)進(jìn)行宏觀和微觀層次的指標(biāo)分析。微觀指標(biāo)層次進(jìn)行中心度(點(diǎn)度中心度、接近中心度、中介中心度)等指標(biāo)的對(duì)比分析。最后利用Gephi對(duì)共現(xiàn)網(wǎng)絡(luò)進(jìn)行可視化分析。結(jié)果與結(jié)論:本研究所使用的生物醫(yī)學(xué)實(shí)體識(shí)別及關(guān)聯(lián)發(fā)現(xiàn)方法能夠幫助科研人員從大規(guī)模的生物醫(yī)學(xué)文本中快速探測(cè)被隱藏的關(guān)聯(lián),具有良好的推廣性,也同樣適用于疾病-基因、基因-藥物等其他生物醫(yī)學(xué)實(shí)體之間的分析。
[Abstract]:Objective: biomedical entity is to include in the medical research literature of the disease, medication, gene name, term or concept, which is a document containing knowledge, understand the relationship for scientific research is of great significance. However, such knowledge is submerged in the ocean of literature, need an effective knowledge management mode will quickly show for researchers. In view of this, this study intends to carry out the research on related diseases and drug entity mining based on scientific literature. Methods: 1. literature analysis by collecting, collating relevant literature, identification, analysis of the current domestic and foreign related research history, current situation and existing problems. In the reading, sorting, summary, analysis of these documents on the reference of others' research achievements, to form their own research framework of.2. programming language and database technology using Java and Python programming language and database technology of the P UbMed ID related information download, batch data ordering, and in the MySQL database were database.3. biological entity recognition method by using Python programming language matching method to identify disease entity and entity.4. drug information measurement dictionary based on information in metrology co-occurrence relationship building disease and drug entity co-occurrence based on the network, and the use of the word frequency analysis and co word analysis of disease associated with drug entity analysis.5. social network analysis method of co-occurrence analysis tool Pajek network to analyze the macro and micro level indicators. Using social network micro index level center (degree centrality, closeness centrality, betweenness centrality) comparative analysis other indicators. Finally, using Gephi visualization analysis of co-occurrence network. Results and conclusion: biomedical and entity recognition used in this study The joint discovery method can help researchers quickly detect hidden associations from large-scale biomedical texts, and has good generalization. It also applies to the analysis of diseases, genes, genes, drugs and other biomedical entities.
【學(xué)位授予單位】:山西醫(yī)科大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類(lèi)號(hào)】:TP391.1;R-05
【參考文獻(xiàn)】
相關(guān)期刊論文 前5條
1 王鵬遠(yuǎn);姬東鴻;;基于多標(biāo)簽CRF的疾病名稱(chēng)抽取[J];計(jì)算機(jī)應(yīng)用研究;2017年01期
2 李麗雙;何紅磊;劉珊珊;黃德根;;基于詞表示方法的生物醫(yī)學(xué)命名實(shí)體識(shí)別[J];小型微型計(jì)算機(jī)系統(tǒng);2016年02期
3 龔樂(lè)君;韋有兵;謝建明;袁志棟;孫嘯;;一種面向基因與疾病關(guān)系的文本挖掘方法[J];東南大學(xué)學(xué)報(bào)(自然科學(xué)版);2010年03期
4 鄭強(qiáng);劉齊軍;王正華;朱云平;;生物醫(yī)學(xué)命名實(shí)體識(shí)別的研究與進(jìn)展[J];計(jì)算機(jī)應(yīng)用研究;2010年03期
5 馮璐;冷伏海;;共詞分析方法理論進(jìn)展[J];中國(guó)圖書(shū)館學(xué)報(bào);2006年02期
相關(guān)會(huì)議論文 前1條
1 王浩暢;趙鐵軍;劉延力;于浩;;生物醫(yī)學(xué)文本中命名實(shí)體識(shí)別的智能化方法[A];2006年首屆ICT大會(huì)信息、知識(shí)、智能及其轉(zhuǎn)換理論第一次高峰論壇會(huì)議論文集[C];2006年
相關(guān)碩士學(xué)位論文 前1條
1 潘昌霖;臨床醫(yī)學(xué)中數(shù)據(jù)挖掘技術(shù)的研究與應(yīng)用[D];中國(guó)人民解放軍醫(yī)學(xué)院;2013年
,本文編號(hào):1656087
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1656087.html