基于多標(biāo)簽CRF的疾病名稱(chēng)抽取
發(fā)布時(shí)間:2018-08-30 12:48
【摘要】:生物醫(yī)療文本中的命名實(shí)體識(shí)別對(duì)于構(gòu)建和挖掘大型臨床數(shù)據(jù)庫(kù)以服務(wù)于臨床決策具有重要意義,而其中一個(gè)基礎(chǔ)工作是疾病名稱(chēng)的識(shí)別。醫(yī)療文本中存在大量的復(fù)合疾病名稱(chēng),難以分離抽取出其中的實(shí)體。針對(duì)這一問(wèn)題,提出一種基于多標(biāo)簽的條件隨機(jī)場(chǎng)算法,首先對(duì)數(shù)據(jù)標(biāo)注多層標(biāo)簽,每層標(biāo)簽針對(duì)復(fù)合疾病名稱(chēng)中的不同疾病,然后用整合后的最終標(biāo)簽去訓(xùn)練模型,最后再對(duì)模型預(yù)測(cè)的標(biāo)簽進(jìn)行分離。此方法能夠識(shí)別傳統(tǒng)條件隨機(jī)場(chǎng)算法無(wú)法識(shí)別的復(fù)合疾病名稱(chēng),實(shí)驗(yàn)結(jié)果驗(yàn)證了所提算法的有效性。
[Abstract]:The identification of named entities in biomedical texts is of great significance for constructing and mining large clinical databases to serve clinical decisions, and one of the basic tasks is the recognition of disease names. There are a large number of complex disease names in medical texts, so it is difficult to separate and extract the entities. In order to solve this problem, a conditional random field algorithm based on multi-label is proposed. Firstly, the data is labeled with multi-layer label, each layer label is aimed at different diseases in the name of complex disease, and then the model is trained with the integrated final label. Finally, the label of model prediction is separated. This method can recognize the complex disease names which can not be recognized by the traditional conditional random field algorithm. The experimental results show that the proposed algorithm is effective.
【作者單位】: 武漢大學(xué)計(jì)算機(jī)學(xué)院;
【分類(lèi)號(hào)】:TP391
本文編號(hào):2213113
[Abstract]:The identification of named entities in biomedical texts is of great significance for constructing and mining large clinical databases to serve clinical decisions, and one of the basic tasks is the recognition of disease names. There are a large number of complex disease names in medical texts, so it is difficult to separate and extract the entities. In order to solve this problem, a conditional random field algorithm based on multi-label is proposed. Firstly, the data is labeled with multi-layer label, each layer label is aimed at different diseases in the name of complex disease, and then the model is trained with the integrated final label. Finally, the label of model prediction is separated. This method can recognize the complex disease names which can not be recognized by the traditional conditional random field algorithm. The experimental results show that the proposed algorithm is effective.
【作者單位】: 武漢大學(xué)計(jì)算機(jī)學(xué)院;
【分類(lèi)號(hào)】:TP391
【相似文獻(xiàn)】
相關(guān)重要報(bào)紙文章 前6條
1 喬通 高嵐;帶有疾病名稱(chēng)的廣告禁止在新聞媒體發(fā)布[N];臨汾日?qǐng)?bào);2006年
2 記者 李學(xué)梅;疾病名稱(chēng)不得上醫(yī)療廣告[N];北京日?qǐng)?bào);2006年
3 古萬(wàn)曦 王克立;衛(wèi)生信息及交流的標(biāo)準(zhǔn)化[N];中國(guó)中醫(yī)藥報(bào);2004年
4 本報(bào)記者 賈君;醫(yī)療廣告不得宣稱(chēng)診療方法[N];中國(guó)消費(fèi)者報(bào);2006年
5 馬安寧;試用“按病種床日”收費(fèi)[N];健康報(bào);2007年
6 羅竹云 陳瑞祥;福州查處網(wǎng)絡(luò)醫(yī)療廣告案[N];中國(guó)工商報(bào);2010年
,本文編號(hào):2213113
本文鏈接:http://sikaile.net/xiyixuelunwen/2213113.html
最近更新
教材專(zhuān)著