基于多標簽CRF的疾病名稱抽取
發(fā)布時間:2018-08-30 12:48
【摘要】:生物醫(yī)療文本中的命名實體識別對于構(gòu)建和挖掘大型臨床數(shù)據(jù)庫以服務(wù)于臨床決策具有重要意義,而其中一個基礎(chǔ)工作是疾病名稱的識別。醫(yī)療文本中存在大量的復(fù)合疾病名稱,難以分離抽取出其中的實體。針對這一問題,提出一種基于多標簽的條件隨機場算法,首先對數(shù)據(jù)標注多層標簽,每層標簽針對復(fù)合疾病名稱中的不同疾病,然后用整合后的最終標簽去訓(xùn)練模型,最后再對模型預(yù)測的標簽進行分離。此方法能夠識別傳統(tǒng)條件隨機場算法無法識別的復(fù)合疾病名稱,實驗結(jié)果驗證了所提算法的有效性。
[Abstract]:The identification of named entities in biomedical texts is of great significance for constructing and mining large clinical databases to serve clinical decisions, and one of the basic tasks is the recognition of disease names. There are a large number of complex disease names in medical texts, so it is difficult to separate and extract the entities. In order to solve this problem, a conditional random field algorithm based on multi-label is proposed. Firstly, the data is labeled with multi-layer label, each layer label is aimed at different diseases in the name of complex disease, and then the model is trained with the integrated final label. Finally, the label of model prediction is separated. This method can recognize the complex disease names which can not be recognized by the traditional conditional random field algorithm. The experimental results show that the proposed algorithm is effective.
【作者單位】: 武漢大學(xué)計算機學(xué)院;
【分類號】:TP391
本文編號:2213113
[Abstract]:The identification of named entities in biomedical texts is of great significance for constructing and mining large clinical databases to serve clinical decisions, and one of the basic tasks is the recognition of disease names. There are a large number of complex disease names in medical texts, so it is difficult to separate and extract the entities. In order to solve this problem, a conditional random field algorithm based on multi-label is proposed. Firstly, the data is labeled with multi-layer label, each layer label is aimed at different diseases in the name of complex disease, and then the model is trained with the integrated final label. Finally, the label of model prediction is separated. This method can recognize the complex disease names which can not be recognized by the traditional conditional random field algorithm. The experimental results show that the proposed algorithm is effective.
【作者單位】: 武漢大學(xué)計算機學(xué)院;
【分類號】:TP391
【相似文獻】
相關(guān)重要報紙文章 前6條
1 喬通 高嵐;帶有疾病名稱的廣告禁止在新聞媒體發(fā)布[N];臨汾日報;2006年
2 記者 李學(xué)梅;疾病名稱不得上醫(yī)療廣告[N];北京日報;2006年
3 古萬曦 王克立;衛(wèi)生信息及交流的標準化[N];中國中醫(yī)藥報;2004年
4 本報記者 賈君;醫(yī)療廣告不得宣稱診療方法[N];中國消費者報;2006年
5 馬安寧;試用“按病種床日”收費[N];健康報;2007年
6 羅竹云 陳瑞祥;福州查處網(wǎng)絡(luò)醫(yī)療廣告案[N];中國工商報;2010年
,本文編號:2213113
本文鏈接:http://sikaile.net/xiyixuelunwen/2213113.html
最近更新
教材專著