面向醫(yī)療領(lǐng)域的中文命名實體識別
[Abstract]:With the explosive growth of text data in recent years and the establishment and popularization of large-scale knowledge base, the research of named entity recognition has gradually become a research hotspot in the field of natural language processing. However, traditional methods based on supervised learning require large scale tagging corpus. In the medical field where tagging data is scarce, the traditional naming entity recognition method can not achieve the desired results. With the development and popularization of deep learning, cyclic neural network (RNN,Recurrent Ne ural Network), especially LSTM (long and short term memory unit) (Long-Short Term Memory), has been widely used in the field of natural language processing. And in many research directions, the results are significantly higher than the traditional methods. Therefore, we first use the LSTM model to study the named entity recognition in medical field, and prove that it can achieve more than the traditional conditional random field model (CRF,), both in the evaluation of the research effect and in the practical application level. Conditional Random Fields) works better. Because the standard annotated corpus in the medical field is relatively scarce, we hope that LSTM model can integrate external information on the basis that the LSTM model has achieved better results than the CRF model. At the same time, we learn the linguistic features of the news field and the unsupervised semantic information in the medical field to achieve better results. We make use of the knowledge of transfer learning and pre-training in deep learning to fuse the parameters and optimize the models in the medical field, so that the effectiveness of the model can be further improved. Finally, due to the defects of LSTM model in practical application, we hope to use another method for domain adaptive named entity recognition. In order to find out the domain differences of different knowledge domains, we conducted a comparative experiment of mixing different domain corpus to analyze and explore. The named entity recognition is studied by integrating the semantic vectors of domain difference and unsupervised medical field with GB DT model, and good results are obtained.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP391.1;TP18
【參考文獻】
相關(guān)期刊論文 前6條
1 王鵬遠(yuǎn);姬東鴻;;基于多標(biāo)簽CRF的疾病名稱抽取[J];計算機應(yīng)用研究;2017年01期
2 蘇婭;劉杰;黃亞樓;;在線醫(yī)療文本中的實體識別研究[J];北京大學(xué)學(xué)報(自然科學(xué)版);2016年01期
3 曲春燕;關(guān)毅;楊錦鋒;趙永杰;劉雅欣;;中文電子病歷命名實體標(biāo)注語料庫構(gòu)建[J];高技術(shù)通訊;2015年02期
4 栗偉;趙大哲;李博;彭新茗;劉積仁;;CRF與規(guī)則相結(jié)合的醫(yī)學(xué)病歷實體識別[J];計算機應(yīng)用研究;2015年04期
5 張金龍;王石;錢存發(fā);;基于CRF和規(guī)則的中文醫(yī)療機構(gòu)名稱識別[J];計算機應(yīng)用與軟件;2014年03期
6 邱莎;段玻;申浩如;丁海燕;;基于條件隨機場的中文人名識別研究[J];昆明學(xué)院學(xué)報;2011年06期
相關(guān)會議論文 前1條
1 張祝玉;任飛亮;朱靖波;;基于條件隨機場的中文命名實體識別特征比較研究[A];第四屆全國信息檢索與內(nèi)容安全學(xué)術(shù)會議論文集(上)[C];2008年
相關(guān)碩士學(xué)位論文 前1條
1 段超群;面向缺乏標(biāo)注數(shù)據(jù)領(lǐng)域的命名實體識別的研究[D];哈爾濱工業(yè)大學(xué);2015年
,本文編號:2297637
本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/2297637.html