基于神經(jīng)網(wǎng)絡(luò)的中文命名實(shí)體識(shí)別研究
發(fā)布時(shí)間:2018-05-29 01:01
本文選題:中文命名實(shí)體識(shí)別 + 深度學(xué)習(xí) ; 參考:《南京師范大學(xué)》2017年碩士論文
【摘要】:命名實(shí)體識(shí)別任務(wù)是指從文本中識(shí)別出人名、地名和機(jī)構(gòu)名等專有名詞,是自然語言處理的關(guān)鍵技術(shù)之一,也是信息抽取、問答系統(tǒng)、機(jī)器翻譯等應(yīng)用的重要基礎(chǔ)性工作。傳統(tǒng)的基于統(tǒng)計(jì)學(xué)習(xí)模型的命名實(shí)體識(shí)別方法通常需要特征工程,特征對(duì)系統(tǒng)性能有較大影響,但是特征模板的設(shè)計(jì)需要大量人工參與和專家知識(shí)。為了減弱系統(tǒng)對(duì)人工特征設(shè)計(jì)的依賴,本文采用深度學(xué)習(xí)方法,結(jié)合中文命名實(shí)體識(shí)別任務(wù)的特點(diǎn),研究基于神經(jīng)網(wǎng)絡(luò)的中文命名實(shí)體識(shí)別方法。本文的主要工作如下:(1)圍繞命名實(shí)體識(shí)別任務(wù)和深度學(xué)習(xí)方法,討論與分析了任務(wù)難點(diǎn)、常用的命名實(shí)體識(shí)別研究方法、深度學(xué)習(xí)方法基礎(chǔ)、詞向量以及常用的神經(jīng)網(wǎng)絡(luò)模型。(2)基于神經(jīng)網(wǎng)絡(luò)的字符標(biāo)注方式實(shí)現(xiàn)了一個(gè)中文命名實(shí)體識(shí)別的基線(base1ine)系統(tǒng)。該方法采用雙向長(zhǎng)短期記憶模型,將中文命名實(shí)體識(shí)別任務(wù)看作一個(gè)序列標(biāo)注問題,以中文句子中字符向量表示作為輸入特征充分考慮上一下文信息,通過對(duì)中文序列中的每個(gè)字符分配標(biāo)記完成命名實(shí)體識(shí)別任務(wù)。(3)對(duì)基于神經(jīng)網(wǎng)絡(luò)的片段級(jí)中文命名實(shí)體識(shí)別方法進(jìn)行了探索性研究。由于中文句子中的單詞間沒有分隔符號(hào),中文命名實(shí)體識(shí)別需要對(duì)給定的中文序列進(jìn)行切分和實(shí)體分類。相比于對(duì)字符分配標(biāo)記的方法,對(duì)切分片段整體分配標(biāo)記更為合理,可以避免字符序列化標(biāo)注方法中由局部標(biāo)記區(qū)分實(shí)體邊界的不足。本文首次提出了基于神經(jīng)網(wǎng)絡(luò)的片段級(jí)中文命名實(shí)體識(shí)別方法,采用兩種基于神經(jīng)網(wǎng)絡(luò)的模型結(jié)構(gòu),將神經(jīng)網(wǎng)絡(luò)與半馬爾可夫條件隨機(jī)場(chǎng)模型相結(jié)合,通過對(duì)切分片段整體分配標(biāo)記完成中文命名實(shí)體識(shí)別。本文對(duì)提出的中文命名實(shí)體識(shí)別方法進(jìn)行了一系列實(shí)驗(yàn),實(shí)驗(yàn)結(jié)果表明,基于神經(jīng)網(wǎng)絡(luò)的片段級(jí)中文命名實(shí)體識(shí)別方法相比于base1ine系統(tǒng)方法獲得了顯著的性能提升。
[Abstract]:The task of identifying named entities refers to the recognition of proper nouns such as personal names, place names and agency names from texts. It is one of the key technologies of natural language processing, and it is also an important basic work in the applications of information extraction, question answering system, machine translation and so on. Traditional named entity recognition methods based on statistical learning model usually require feature engineering, and feature has great influence on system performance, but the design of feature template requires a lot of manual participation and expert knowledge. In order to reduce the dependence of the system on artificial feature design, this paper studies the Chinese named entity recognition method based on neural network by using depth learning method and combining the characteristics of Chinese named entity recognition task. The main work of this paper is as follows: (1) focusing on the task and depth learning method of named entity recognition, this paper discusses and analyzes the task difficulties, common research methods of named entity recognition, and the foundation of depth learning method. Word vector and the commonly used neural network model. 2) A Chinese named entity recognition base-line system is implemented based on the character tagging method based on neural network. In this method, a bi-directional long-term and short-term memory model is adopted, and the task of identifying Chinese named entities is regarded as a sequence tagging problem. The character vector representation in a Chinese sentence is taken into account as an input feature. The task of named entity recognition is accomplished by assigning each character in Chinese sequence. (3) the method of segment level named entity recognition in Chinese based on neural network is studied in this paper. Since there are no separated symbols between words in Chinese sentences, Chinese named entity recognition needs to segment and classify the given Chinese sequences. Compared with the method of assigning tags to characters, it is more reasonable to assign tags to segmented fragments as a whole, which can avoid the deficiency of distinguishing entity boundaries by local markers in the method of character serialization. In this paper, for the first time, a Chinese named entity recognition method based on neural network is proposed. Two neural network-based models are used to combine the neural network with the semi-Markov conditional random field model. The Chinese named entity recognition is accomplished by the integral allocation tag of the segmented fragment. In this paper, a series of experiments are carried out on the proposed Chinese named entity recognition method. The experimental results show that the segmented Chinese named entity recognition method based on neural network has achieved a significant performance improvement compared with the base1ine system method.
【學(xué)位授予單位】:南京師范大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP391.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前2條
1 來斯惟;徐立恒;陳玉博;劉康;趙軍;;基于表示學(xué)習(xí)的中文分詞算法探索[J];中文信息學(xué)報(bào);2013年05期
2 余凱;賈磊;陳雨強(qiáng);徐偉;;深度學(xué)習(xí)的昨天、今天和明天[J];計(jì)算機(jī)研究與發(fā)展;2013年09期
相關(guān)博士學(xué)位論文 前1條
1 付瑞吉;開放域命名實(shí)體識(shí)別及其層次化類別獲取[D];哈爾濱工業(yè)大學(xué);2014年
相關(guān)碩士學(xué)位論文 前1條
1 王志強(qiáng);基于條件隨機(jī)域的中文命名實(shí)體識(shí)別研究[D];南京理工大學(xué);2006年
,本文編號(hào):1948888
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1948888.html
最近更新
教材專著