藥物不良事件信息資源整合與數(shù)據(jù)挖掘研究
發(fā)布時間:2018-02-24 11:01
本文關鍵詞: 藥物本體 藥物不良事件 數(shù)據(jù)挖掘 映射 聚合 模型 抗癌藥物 分類 出處:《吉林大學》2014年博士論文 論文類型:學位論文
【摘要】:目前,藥物不良事件的發(fā)生日益成為一個嚴重的公共衛(wèi)生問題。藥物上市前雖然經(jīng)過嚴格的不良事件實驗研究,但仍不能夠發(fā)現(xiàn)所有的潛在不良事件。在20世紀60年代“反應!保╰halidomide)事件之后,許多國家引入了藥物警戒(phamacovigilance)系統(tǒng)對上市藥品進行監(jiān)測。美國藥品與食品管理局(Food andDrug Administration,F(xiàn)DA)的藥物不良事件報告系統(tǒng)(Adverse Event ReportingSystem,AERS)數(shù)據(jù)庫主要用于發(fā)現(xiàn)那些在臨床試驗階段由于出現(xiàn)頻次低而沒有被識別出的罕見嚴重不良事件,或者新的藥品不良事件,即安全性信號。如果在AERS中發(fā)現(xiàn)藥物潛在的安全問題,F(xiàn)DA將進行流行病學研究以進一步評價該不良事件,確定藥物與不良事件之間的因果關系;趯λ幬锊涣际录陌踩u價,F(xiàn)DA可能采取一系列的法規(guī)調(diào)整以提高產(chǎn)品安全及保障公眾健康,如更新藥品說明信息,限制使用藥品,向公眾介紹新的安全相關信息,或在少數(shù)情況下,從市場上撤銷該藥品。 當前,關于不良事件數(shù)據(jù)挖掘的多數(shù)研究都集中于利用小部分數(shù)據(jù)進行不良事件的數(shù)據(jù)挖掘,避免對大規(guī)模數(shù)據(jù)的利用和研究;對藥物不良事件進行藥物的作用機制、藥代動力學及生理作用等方面的深度挖掘的研究,以及對某類藥物的不良事件進行比較性數(shù)據(jù)挖掘的研究、進行藥物的作用機制、藥代動力學及生理作用等方面的深度挖掘的研究,以及對某類藥物的不良事件進行比較性數(shù)據(jù)挖掘的研究、AERS與其他數(shù)據(jù)源的集成難以實現(xiàn)難以實現(xiàn),而這類大規(guī)模、深層次的挖掘在揭示不同藥物類別的不良事件特征、藥物不良事件的原因以及基因相關性方面都具有重要意義,是藥物不良事件監(jiān)測乃至臨床用藥安全研究的重要方向。缺乏對藥物不良事件相關數(shù)據(jù)資源的知識整合嚴重限制了上述研究。 藥物不良事件信息資源的知識整合既是有效利用海量醫(yī)療信息資源的現(xiàn)實需求,也是提高藥物不良事件數(shù)據(jù)挖掘效率所需認真研究并必須解決的關鍵問題。近年來藥物領域本體的發(fā)展雖然為資源整合研究提供了實現(xiàn)契機,然而由于藥物領域本體的復雜性、數(shù)據(jù)缺乏規(guī)范化以及領域本體映射的技術難題,藥物不良事件領域數(shù)據(jù)在知識集成與深度聚合方面始終未能求得理想的解決方案,藥物不良事件的數(shù)據(jù)挖掘也因此未能擴展到對大規(guī)模數(shù)據(jù)的利用和分析。 領域本體可以提供相關知識決策和推理支持,促進大規(guī)模藥物安全信號的檢測和藥物不良事件的深度挖掘。本研究利用生物醫(yī)學領域本體將AERS相關信息資源有機整合起來,實現(xiàn)知識集成、信息聚合、與其他醫(yī)療數(shù)據(jù)資源之間的互操作、豐富了藥物不良事件數(shù)據(jù)挖掘的資源并促進對藥物安全信號的檢測。 本研究的主要內(nèi)容包括: (1)提出藥物領域本體映射與聚合模型 實現(xiàn)本體映射以及對藥物信息的分類與聚合將為藥物相關知識決策和推理支持提供前提條件,同時也是構建領域知識庫的重要基礎,對于進一步針對藥物的用機制、藥代動力學及生理作用等方面的深度數(shù)據(jù)挖掘具有重要意義。由于領域本體自身結(jié)構的復雜性和領域本體之間的異構性,藥物領域本體映射方法成為實現(xiàn)本體映射的難點之一。本研究提出藥物領域本體映射與聚合模型模型,并以該模型為指導,對藥物領域本體RxNorm與NDF-RT(美國國家藥物文件—參考術語)進行映射實例研究,提出了RxNorm與NDF-RT兩個領域本體之間映射及信息分類與聚合的一種新方法。研究結(jié)果證明該模型不僅具有可行性,也顯示出其對多本體能夠充分復用的實踐價值;該模型也將在語義層面上進一步深化信息資源的知識組織方法,促進數(shù)字資源語義體系的構建。模型的不足之處在于,模型的使用是以現(xiàn)有本體為基礎的,因此現(xiàn)有本體中的概念關系以及分類聚合信息的不足將將最終影響本體映射分類聚合的效果。另外,領域本體的其他特性也可能是改善知識組織方法的因素,因此,未來研究中應對領域本體進行更全面的調(diào)研,抽取有效的共有特征,促進模型的完善。 (2)基于RxNorm的AERS藥名規(guī)范化初步研究 調(diào)查AERS藥名被RxNorm的收錄情況,是探索如何充分發(fā)揮RxNorm在AERS數(shù)據(jù)挖掘中作用的第一步,也是至關重要的一步。 本研究計算2004年到2010年AERS中全部藥物名稱與RxNorm精確匹配的比例,并與UMLS進行比較分析。結(jié)果顯示了RxNorm和UMLS對AERS中唯一藥物名稱精確匹配的整體收錄范圍分別為13,565(4.8%)個與21,272(7.5%)個。2011AA版UMLS集成了160個源詞匯表, UMLS對AERS的藥名覆蓋分別來自包括RxNorm在內(nèi)的各種來源詞表,其中RxNorm映射的數(shù)量排列第一。然后手工分析了頻次大于1000的200個未被映射的高頻AERS藥物名稱及分析388個隨機選擇的頻次小于1000的低頻藥物名稱,調(diào)查了某些藥名未被映射的原因。盡管在AERS中,數(shù)據(jù)來源廣泛且存在錄入錯誤,但是高頻詞仍然能夠顯示出特定領域的詞匯使用習慣。我們的研究將為RxNorm本體的完善提供依據(jù)。本章的研究也對下章研究中選擇自然語言處理工具MedEx(以RxNorm為基礎)提供了依據(jù)。 (3)構建數(shù)據(jù)挖掘知識整合庫(AERS-DM) 在AERS藥名規(guī)范化進行調(diào)查研究基礎上,選擇利用自然語言處理工具MedEx對AERS中藥名進行規(guī)范化,并對其自然語言處理效果進行評價。在藥物領域本體映射與聚合模型模型的基礎上,使用貪婪算法將AERS中的藥名聚合到RxNorm和NDF-RT中的藥物分類信息。對于藥物不良事件,通過映射方法將其映射到MedDRA中的PT和SOC代碼進行聚合。最終建立開源的藥物——不良事件數(shù)據(jù)挖掘知識整合庫(AERS-DM)(網(wǎng)址:http://informatics.mayo.edu/adepedia/index.php/Download),最后通過實例研究,證實了AERS-DM數(shù)據(jù)集的挖掘效果。 AERS-DM中的信息集成了藥物及不良事件知識庫。AERS-DM具有規(guī)范化代碼和聚合的功能,可以為AERS藥物安全信號的挖掘以及相關數(shù)據(jù)挖掘領域提供更多的資源。該數(shù)據(jù)集包含兩個表。一個表存儲藥物及不良事件的規(guī)范化信息,另一個表存儲藥物和不良事件的聚合信息。共有37,029,228對藥物及不良事件記錄。AERS中的藥名被規(guī)范為14,490個RxNorm藥名(由RxNorm代碼表示),其中10,221個規(guī)范化的藥名可以歸到NDF-RT類別中,占71%。對于AERS-DM中的不良事件,共有14,740個MedDRA中的PT術語被聚合到MedDRA的SOC代碼,占MedDRA中所有PT術語的76%。AERS-DM中,RxNorm代碼表示的藥名與MedDRA中的PT唯一對,即規(guī)范化后的藥名與不良事件的唯一配對,共有4,639,613,將不良事件按組織器官聚合后,藥物與不良事件組織器官的配對共205,725對。 (4)AERS-DM數(shù)據(jù)挖掘知識整合庫的數(shù)據(jù)挖掘?qū)嵶C研究 AERS-DM是一個規(guī)范化和聚合的數(shù)據(jù)挖掘知識整合庫,,優(yōu)勢在于藥物數(shù)據(jù)的規(guī)范化,以及藥物數(shù)據(jù)和不良事件數(shù)據(jù)的分類聚合,這些分類聚合知識全部來自于生物醫(yī)學本體中所含有的知識結(jié)構。傳統(tǒng)的利用AERS進行的不良事件檢測研究大多僅針對少量藥物,進行大規(guī)模數(shù)據(jù)挖掘的研究數(shù)量較少。在本研究中,我們利用常用抗癌藥物成分信息對藥物作用機制、生理作用、治療意向的藥物聚類與藥物不良事件的聚類,以及年齡與性別的藥物不良事件差別進行了大規(guī)模的系統(tǒng)分析,進一步證實了AERS-DM的語義挖掘潛力。 傳統(tǒng)的不良事件檢測依賴比例失衡測度,主要是量化出藥物-不良事件關聯(lián)的“始料未及”的程度,并試圖克服自發(fā)報告系統(tǒng)中不良事件缺乏疾病發(fā)生率背景信息的缺點。在此研究中我們提出了一種新的不良事件檢測方法,在這種方法中,通過將AERS數(shù)據(jù)與電子病歷數(shù)據(jù)連接起來,從而獲得不良事件的疾病發(fā)生率信息,并實現(xiàn)大規(guī)模藥物不良事件之間的比較研究。本研究證實了AERS-DM作為AERS的一個高級版本,是一個可用于數(shù)據(jù)挖掘的豐富資源。 本文的創(chuàng)新點包括: (1)理論創(chuàng)新 提出藥物領域本體映射與聚合模型。由于本體開發(fā)的局限性,當前領域本體各有特點,因此本研究提出的藥物領域本體映射與聚合模型,充分利用不同本體的特點,通過本體映射,將某一本體的分類信息與其他本體的內(nèi)容形成互補,實現(xiàn)某一領域多個本體的分類聚合功能,從而節(jié)約本體開發(fā)成本,實現(xiàn)本體充分復用。 (2)方法創(chuàng)新 (i)在藥物領域本體映射與聚合模型的基礎上,開發(fā)出一套系統(tǒng)的分類聚合算法,實現(xiàn)利用NDF-RT與RxNorm對AERS數(shù)據(jù)庫中的藥物進行分類聚合。方法創(chuàng)新體現(xiàn)在兩方面:①利用RxNorm中的豐富關系來推理出可以映射到NDF-RT本體并能進一步進行藥物分類的術語。②同時利用臨床藥物名和通用藥物名來找到NDF-RT的多軸分類,以此避免單獨使用通用藥物名進行映射可能漏掉的分類。與現(xiàn)有的其他方法相比,此方法適用于更加復雜的情況。 (ii)利用自然語言處理工具與生物醫(yī)學本體對AERS大規(guī)模數(shù)據(jù)進行規(guī)范化和信息聚合,使藥物不良事件的大規(guī)模信號檢測成為可能。在此基礎上,實現(xiàn)了一種新的不良事件檢測方法,通過將AERS數(shù)據(jù)與電子病歷數(shù)據(jù)連接起來,獲得不良事件的疾病發(fā)生率信息,實現(xiàn)大規(guī)模藥物不良事件之間的比較研究。
[Abstract]:At present, occurrence of adverse drug events is becoming a serious public health problem. Although the drug before the listing after the study of adverse events with strict experiment, but still can not find all potential adverse events. In 1960s, thalidomide (thalidomide) after the events, many countries introduced the pharmacovigilance system (phamacovigilance) the monitoring of the listed drugs. The U.S. Food and Drug Administration (Food andDrug, Administration, FDA) drug adverse event reporting system (Adverse Event ReportingSystem AERS) database is mainly used to find those in clinical trials due to the low frequency without identified rare serious adverse events, or adverse drug events, namely safety signal. If the potential drug safety problems found in AERS, FDA will conduct epidemiological studies to further evaluate the adverse The event, to determine the causal relationship between the drug and adverse events. The safety evaluation of adverse drug events based on FDA may take a series of adjustment of laws and regulations to improve product safety and protect public health, such as updating the drug information, limiting the use of drugs, the new security related information to the public, or in a few cases, revocation the drug from the market.
At present, most research on adverse event data mining has focused on the small part of the data of adverse events to avoid the use of data mining, and Research on large-scale data; for adverse drug events drug mechanism, pharmacokinetics and physiological functions and other aspects of the depth of excavation, comparative study of data mining and the adverse events for a certain class of drugs, of drug mechanism, study the pharmacokinetics and physiological function of the depth of excavation, and the adverse events of a drug comparative data mining research, integration of AERS and other data sources to realize the difficult to achieve, and this kind of large-scale, deep the characteristics of mining in adverse events reveal different drug categories, are important causes of adverse drug events and genetic correlation, the drug is bad The important direction of event monitoring and clinical drug safety research. Lack of knowledge integration on adverse drug events related data resources seriously restricts the research.
Is the knowledge integration of adverse drug events and the effective use of information resources needs massive medical information resources, but also improve the drug adverse event data key problems must be solved to seriously study and the mining efficiency. In recent years the development of drug ontology is provided for research on the resource integration to achieve an opportunity, however due to the complexity of drug ontology, data the lack of standardization and domain ontology mapping technology problem solution data field of adverse drug events in the knowledge integration and the depth of polymerization has failed to obtain the ideal data mining, drug adverse events and therefore failed to extend to the use and analysis of large-scale data.
Domain ontology can provide relevant knowledge and reasoning decision support, to promote the depth of mining large scale drug safety signal detection and drug adverse events. This study used biomedical ontology AERS related information resources integration, integration of knowledge, information aggregation, interoperability with other medical data resources, enrich the adverse drug events data mining resources and promote the detection of drug safety signals.
The main contents of this study include:
(1) drug ontology mapping and aggregation model is proposed
Ontology mapping and classification of drug information and polymerization for drug related knowledge and decision reasoning support provided a prerequisite, but also an important foundation to build a domain knowledge base, to further for the drug mechanism, the depth data of pharmacokinetics and physiological function of mining has important significance. Due to the heterogeneity between the complexity of the field the structure of ontology and domain ontology, ontology mapping method of drugs has become one of difficulties in ontology mapping. This study proposes drug ontology mapping and polymerization model, and with the help of this model, the ontology RxNorm and NDF-RT drugs (national drug field file - terms of reference) of mapping examples, put forward the mapping between RxNorm and NDF-RT two domain ontology and information classification and aggregation of a new method. The results show that the model Is not only feasible, but also shows its multi ontology can fully reuse the practical value; knowledge organization method of the model will also be at the semantic level to further deepen the information resources, promote the construction of digital resources in the semantic system. The inadequacies of the model is that the model used is based on the existing ontology based, so the lack of concept the relationship between the existing ontology and classification information aggregation will ultimately impact the ontology mapping classification polymerization effect. In addition, other characteristics of domain ontology may also be factors, improve the knowledge organization methods, therefore, not to deal with the ontology research fields in research more comprehensive and effective extraction of characteristics, to improve the model.
(2) preliminary study of AERS RxNorm based on standardized medicine
The investigation of AERS drug names included RxNorm, the first step is to explore how to give full play to the role of RxNorm in AERS data mining, but also a crucial step.
The research on the calculation of 2004 to 2010 all the drug name AERS and RxNorm exact matching ratio, and compare with UMLS. The results show that the overall RxNorm and UMLS on AERS was the only drug name matching coverage were 13565 (4.8%) and 21272 (7.5%).2011AA version of UMLS integrates 160 sources vocabulary, vocabulary of the various sources of UMLS AERS the name of the drug coverage from including RxNorm, in which the number of RxNorm mapping. Then analyzes the arrangement of first hand frequency is greater than 1000 of the 200 drugs is not high frequency AERS mapping name and analysis of 388 randomly selected low drug frequency less than 1000 names, some investigation the name of the drug have not been mapped reasons. Although in AERS, a wide range of data sources and input errors, but high frequency word still can show the domain specific vocabulary usage. Our research will be To provide the basis for improving the RxNorm ontology. This chapter also on the Natural Language Processing MedEx tool to select the next chapter in the study (based on RxNorm) to provide the basis.
(3) construct data mining knowledge integration database (AERS-DM)
In the name of the drug AERS standardized on the basis of investigation, selection of standardization of the Chinese medicine AERS using Natural Language Processing MedEx tools, and to evaluate its effect. Natural Language Processing in the field of pharmaceuticals based ontology mapping and aggregation model, using the greedy algorithm in the AERS will be the name of the drug are aggregated into drug RxNorm and classification NDF-RT information for adverse drug events, it is mapped to MedDRA PT and SOC code were aggregated by mapping method. Finally establish the open source drug adverse event data mining knowledge integration database (AERS-DM) (address: http://informatics.mayo.edu/adepedia/index.php/Download), finally through the case study, confirmed the AERS-DM data sets mining effect.
The information in the AERS-DM integrated drug adverse events and knowledge base.AERS-DM has standardized code and aggregation function, can provide more resources for mining AERS drug safety signal and related data mining. The data set contains two tables. A table to store the information specification of drugs and adverse events, another a table to store the drug adverse events and information aggregation. A total of 37029228 of drug adverse events and record in the.AERS name of the drug are normalized to 14490 RxNorm the name of the drug (voiced by RxNorm code), of which 10221 standard drug names can be categorized into NDF-RT categories, accounting for 71%. for adverse events in AERS-DM. A total of 14740 PT term MedDRA was polymerized to MedDRA SOC code, all PT terms accounted for MedDRA of the 76%.AERS-DM, MedDRA and RxNorm said the name of the drug code in PT only on that, after the drug name and specification The only pair of adverse events, a total of 4639613, the adverse events according to the tissues and organs after polymerization, and drug adverse event organs matching a total of 205725.
(4) AERS-DM data mining based on knowledge integration of Library Data Mining
AERS-DM is a standardized and aggregated data mining knowledge integration advantage of drug library, data standardization, data classification and drug adverse events and data aggregation, the classification of knowledge from the knowledge structure of the polymer containing in biomedical ontology. The study of adverse events in the traditional AERS based detection only for small the drug, large-scale data mining studies. In this study, we use the commonly used anticancer drug ingredients information on physiological mechanism of drug action, intention to treat the drug clustering and clustering of adverse drug events and adverse drug events, age and gender differences in the analysis of large-scale systems, further confirmed the AERS-DM the semantic mining potential.
Traditional adverse event detection depends on the imbalance measure, mainly to quantify drug adverse events associated with the "unexpected", and try to overcome the adverse events of spontaneous reporting system in the lack of disease incidence background information shortcomings. In this study we propose a detection method of new adverse events, in this way, by connecting the AERS data and the electronic medical record data, the incidence of adverse events so as to obtain the information of disease, and to achieve a comparative study between large-scale drug adverse events. This study confirmed that AERS-DM as an advanced version of AERS, is a rich resource for data mining.
The innovations of this paper include:
(1) the theory of innovation
The drugs in the field of ontology mapping and aggregation model. Due to the limitations of the current development of ontology, domain ontology with different characteristics, so the drug ontology mapping proposed by this study and aggregation model, make full use of different characteristics of ontology, the ontology mapping, an ontology classification information and other ontology content complementary, classification a field of multiple ontology aggregation function, thus saving the ontology development cost, ontology reuse.
(2) the method of innovation
(I) in the field of medicine based ontology mapping and aggregation model, developed a set of classification system of drug aggregation algorithm in the AERS database were classified by NDF-RT and RxNorm. The polymerization method innovation is reflected in two aspects: first, to push out can be mapped to NDF-RT ontology and further classification of drugs the use of rich term relationships in RxNorm. At the same time the use of clinical drugs and generic drugs to find NDF-RT multi axis classification, in order to avoid the use of generic drugs alone classification mapping might have missed. Compared with other existing methods, this method is applicable to more complicated situations.
(II) AERS on large-scale data standardization and information aggregation using Natural Language Processing tools and biomedical ontology, the large-scale signal of adverse drug events detection possible. On this basis, the method for detecting new adverse events, the AERS data link up with the electronic medical record data, obtain the incidence of adverse event information the disease, a comparative study between large-scale adverse drug events.
【學位授予單位】:吉林大學
【學位級別】:博士
【學位授予年份】:2014
【分類號】:R95
【參考文獻】
相關期刊論文 前5條
1 呂剛;鄭誠;胡春玲;;基于概念分類的多本體映射方法研究[J];計算機應用研究;2011年09期
2 王效岳;胡澤文;白如江;;WordNet與SUMO本體之間的映射機制研究[J];現(xiàn)代圖書情報技術;2011年01期
3 孔雅慧;;全球藥物不良反應監(jiān)測系統(tǒng)概述[J];藥物流行病學雜志;2011年07期
4 尚鵬輝;詹思延;;數(shù)據(jù)挖掘在藥品不良反應信號檢出和分析中的應用(下)——藥物流行病學研究新方法系列講座(三)[J];中國藥物應用與監(jiān)測;2009年03期
5 錢軼峰;羅寶章;葉小飛;孫亞林;吳美京;王海南;杜文民;賀佳;;檢測聯(lián)合用藥不良反應信號的數(shù)據(jù)挖掘方法[J];中國衛(wèi)生統(tǒng)計;2010年01期
相關博士學位論文 前1條
1 葉小飛;基于自發(fā)呈報系統(tǒng)與循證醫(yī)學的藥品不良反應信號挖掘[D];第二軍醫(yī)大學;2011年
本文編號:1529981
本文鏈接:http://sikaile.net/yixuelunwen/yiyaoxuelunwen/1529981.html
最近更新
教材專著