中文代詞消解關(guān)鍵技術(shù)研究
發(fā)布時(shí)間:2018-04-16 03:13
本文選題:代詞消解 + 詞向量; 參考:《哈爾濱工業(yè)大學(xué)》2017年碩士論文
【摘要】:指代和省略是自然語言中廣泛存在的語言現(xiàn)象,會造成語句的歧義問題,給自然語言理解帶來了極大的困難,尤其是在聊天機(jī)器人等多輪對話的應(yīng)用場景下。指代消解具有較長的研究歷史,從早期的手工規(guī)則等理論方法研究到后來大規(guī)模語料中計(jì)算機(jī)自動(dòng)處理技術(shù)的衍生,再到目前多種機(jī)器學(xué)習(xí)方法的引入,指代消解系統(tǒng)的性能在不斷的提高。但由于對自然語言中語義的理解和表示方法仍然不夠成熟,深層次的語言知識和語義特征的使用還較為簡單,因此沒有對詞、句、篇章多層級的不同特點(diǎn)進(jìn)行足夠深入的挖掘,也沒有對上下文信息進(jìn)行有效的利用。本文旨在完善和提高多輪對話場景下的上下文理解,主要對中文代詞消解及省略恢復(fù)任務(wù)中的關(guān)鍵技術(shù)進(jìn)行研究,特別是在聊天機(jī)器人系統(tǒng)中的使用。主要內(nèi)容包含如下幾個(gè)方面:(1)本文提出了多特征融合的中文代詞消解算法,引入了經(jīng)驗(yàn)向量化特征、語義角色標(biāo)注特征和詞向量等多種類型的特征從多個(gè)角度來刻畫表述對的語義、結(jié)構(gòu)等多層次的特點(diǎn)。本文具體闡述了基于表述對模型的中文代詞消解整體算法框架的構(gòu)建與實(shí)現(xiàn),在此基礎(chǔ)上,探討了多種類特征在該任務(wù)上的不同表現(xiàn),提出并對比了幾種特征融合方法的有效性,并在向量拼接方法的基礎(chǔ)上驗(yàn)證了不同分類器參數(shù)、詞向量維度、分類器閾值等對實(shí)驗(yàn)結(jié)果的影響,據(jù)此得到的最佳的實(shí)驗(yàn)結(jié)果。(2)本文將深度學(xué)習(xí)技術(shù)引入代詞消解任務(wù)中。具體的,使用適宜序列化輸入的長短時(shí)記憶網(wǎng)絡(luò)模型學(xué)習(xí)表述對上下文的深層特征表示,分別將其應(yīng)用于中文代詞消解及省略恢復(fù)任務(wù)中。本文提出了一種基于雙向循環(huán)網(wǎng)絡(luò)的中文零代詞識別算法,嘗試歸納和總結(jié)了零代詞識別任務(wù)中存在的問題,提出相應(yīng)的規(guī)則優(yōu)化方案。本文還研究了不同網(wǎng)絡(luò)結(jié)構(gòu)的深度學(xué)習(xí)模型在中文代詞省略恢復(fù)任務(wù)中的表現(xiàn),通過對比試驗(yàn)得到較優(yōu)的模型和參數(shù)配置。(3)本文實(shí)現(xiàn)了基于微信平臺得智能聊天機(jī)器人系統(tǒng),詳細(xì)介紹系統(tǒng)的總體結(jié)構(gòu)、模塊設(shè)計(jì)和系統(tǒng)展示,并對代詞消解和省略恢復(fù)模塊進(jìn)行說明。在實(shí)踐中探討了中文代詞消解技術(shù)和代詞省略恢復(fù)技術(shù)在智能機(jī)器人系統(tǒng)中的有效性,并對語義補(bǔ)全任務(wù)做出了針對性分析和優(yōu)化。
[Abstract]:Anaphora and ellipsis are a widespread linguistic phenomenon in natural languages, which can lead to ambiguity of sentences and bring great difficulties to the understanding of natural languages, especially in the context of multi-round dialogues such as chat robots.Anaphora resolution has a long history of research, from the early research on manual rules and other theoretical methods to the derivation of computer automatic processing technology in later large-scale corpus, and then to the introduction of various machine learning methods.The performance of the reference digestion system is constantly improving.However, the methods of understanding and expressing semantics in natural languages are still not mature enough, and the use of deep language knowledge and semantic features is relatively simple, so there is no deep enough mining of the different features of words, sentences and texts.There is also no effective use of context information.The purpose of this paper is to perfect and improve the context understanding in multi-round dialogues. The key technologies of Chinese pronoun resolution and ellipsis recovery are studied, especially in the chat robot system.The main contents are as follows: (1) in this paper, we propose a multi-feature fusion algorithm for Chinese pronoun resolution, and introduce the empirical vectorization feature.Many kinds of features such as semantic role tagging features and word vectors describe the semantic and structural features of expression pairs from many angles.In this paper, the construction and implementation of the whole algorithm framework of Chinese pronoun resolution based on representation model are described, and on the basis of this, the different performance of multi-type features in this task is discussed.The effectiveness of several feature fusion methods is proposed and compared, and the effects of different classifier parameters, word vector dimension and classifier threshold on the experimental results are verified on the basis of vector splicing method.Based on this, the best experimental results are obtained. (2) in this paper, the in-depth learning technique is introduced into the task of pronoun resolution.In detail, the long and short memory network model which is suitable for serialization input is used to learn the deep feature representation of context, which is applied to Chinese pronoun resolution and ellipsis recovery task respectively.This paper presents a Chinese zero-pronoun recognition algorithm based on bidirectional cyclic network, tries to sum up and summarizes the existing problems in the task of zero-pronoun recognition, and puts forward the corresponding rule optimization scheme.This paper also studies the performance of depth learning models with different network structures in the task of Chinese pronoun ellipsis recovery, and obtains a better model and parameter configuration by contrast experiments.) this paper implements an intelligent chat robot system based on WeChat platform.The general structure, module design and system display of the system are introduced in detail, and the module of pronoun resolution and ellipsis recovery is explained.In practice, the effectiveness of Chinese pronoun resolution and pronoun ellipsis recovery in intelligent robot system is discussed, and the semantic complement task is analyzed and optimized.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP391.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前8條
1 奚雪峰;周國棟;;基于Deep Learning的代詞指代消解[J];北京大學(xué)學(xué)報(bào)(自然科學(xué)版);2014年01期
2 陳菜芳;;中文語義角色標(biāo)注研究概述[J];文教資料;2012年27期
3 段Z誥,
本文編號:1757038
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1757038.html
最近更新
教材專著