交互式問答中的語句關系識別方法
發(fā)布時間:2018-03-23 06:05
本文選題:問答匹配關系 切入點:補充關系 出處:《哈爾濱工業(yè)大學》2017年碩士論文 論文類型:學位論文
【摘要】:隨著互聯(lián)網(wǎng)技術的發(fā)展和信息量的迅速增長,人們迫切需要一種準確、高效的信息獲取方式。從搜索引擎到智能交互式問答系統(tǒng),信息的獲取方式越來越接近于自然交互。一方面因為海量數(shù)據(jù)的出現(xiàn),另一方面因為機器學習和自然語言處理等技術的長足進步,問答系統(tǒng)進入了面向各領域、基于自由文本和異構(gòu)信息、基于生成式的智能交互式問答發(fā)展階段。與搜索引擎不同的是,用戶無需在多條候選文檔中選擇,問答系統(tǒng)能更好的理解以自然語言形式描述的問題,同時返回簡潔精確的答案。隨著Siri和Watson的成功問世,智能交互式問答系統(tǒng)成為了近年來的一個研究熱點,在商業(yè)領域也越來越具有代替人工客服的潛力。然而,要構(gòu)建更加智能的交互式問答系統(tǒng),從已有的客服日志中學習知識就顯得非常重要,而如何從復雜的交互式問答客服日志中識別問句與答句之間的匹配關系以及連續(xù)語句之間的補充關系則成為了構(gòu)建學習系統(tǒng)的關鍵。本文主要針對交互式問答中的語句匹配關系識別和補充關系識別進行了研究。針對客戶問句與客服回答之間的匹配問題,本文分別構(gòu)建了基于CNN的語義匹配模型和基于RNN的生成模型,模型的輸入層是句子的詞向量矩陣,輸出層是問答匹配的置信度。分別在Semeval-2016社區(qū)問答數(shù)據(jù)和在線客服對話數(shù)據(jù)上,進行了不同模型的性能對比。同時對問句的完整性、生成模型的不同結(jié)構(gòu)、閾值選擇以及客服數(shù)據(jù)的抽取方式等進行了對比實驗分析。實驗結(jié)果表明,在社區(qū)問答數(shù)據(jù)中,本文中基于CNN的匹配模型優(yōu)于RNN生成模型;在客服對話數(shù)據(jù)中,基于RNN的序列學習模型能夠更好的學習到場景對話中的上下文信息。在基于每輪對話且問句完整的數(shù)據(jù)上,MAP達到了84.41%。針對交互式問答中連續(xù)語句之間存在的上下文相關聯(lián)的潛在語義補充關系,本文研究了句子補充關系的識別。在深度模型上,構(gòu)建了并行CNN和串聯(lián)LSTM對句子對進行抽象語義特征提取和建模。分別采用支持向量機、基于CNN的模型和基于RNN的模型,對句子對的補充關系進行分類。實驗結(jié)果表明,基于CNN的識別方法優(yōu)于其他對比方法,其F1值達到了67.8%。最終,將補充關系識別和匹配關系識別相結(jié)合應用于交互式問答語義匹配。
[Abstract]:With the development of Internet technology and the rapid growth of information, people urgently need an accurate and efficient way to obtain information, from search engine to intelligent interactive question answering system. On the one hand, due to the emergence of massive data, on the other hand, due to the rapid progress of machine learning and natural language processing, the question answering system has entered various fields. Based on free text and heterogeneous information, intelligent interactive question-and-answer based on generative stage. Unlike search engines, users do not have to choose from multiple candidate documents. With the success of Siri and Watson, the intelligent interactive question answering system has become a research hotspot in recent years. There is also a growing potential in the business world to replace manual customer service. However, it is important to learn from existing customer service logs in order to build a more intelligent interactive question-and-answer system. However, how to identify the matching relationship between question and answer sentences and the complementary relationship between continuous sentences from the complex interactive Q & A log becomes the key to construct a learning system. To solve the matching problem between customer question and customer service, In this paper, the semantic matching model based on CNN and the generating model based on RNN are constructed respectively. The input layer of the model is the word vector matrix of sentence, the confidence of question and answer matching is at the output level, respectively on Semeval-2016 community question and answer data and online customer service conversation data. At the same time, the integrity of question sentence, the different structure of generating model, the selection of threshold value and the way of extracting customer service data are compared and analyzed. The experimental results show that, in the community question and answer data, In this paper, the matching model based on CNN is superior to the RNN generation model. The sequence learning model based on RNN can better learn the context information in the scene dialogue. The map reached 84.41 on the data based on each round of dialogue and question sentence integrity. The underlying semantic complementary relationship associated with the text, In this paper, the recognition of sentence complement relationship is studied. In depth model, parallel CNN and tandem LSTM are constructed to extract and model the abstract semantic features of sentence pairs. Support vector machine (SVM), CNN based model and RNN based model are used, respectively. The experimental results show that the recognition method based on CNN is superior to other comparison methods, and its F1 value reaches 67.8%. This paper applies complementary relationship recognition and matching relationship recognition to interactive question and answer semantic matching.
【學位授予單位】:哈爾濱工業(yè)大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP391.1
【參考文獻】
相關期刊論文 前4條
1 王寶勛;劉秉權;孫承杰;王曉龍;;網(wǎng)絡問答資源挖掘綜述[J];智能計算機與應用;2012年06期
2 董燕舉;蔡東風;白宇;;面向事實性問題的答案選擇技術研究綜述[J];中文信息學報;2009年01期
3 吳友政,趙軍,段湘煜,徐波;問答式檢索技術及評測研究綜述[J];中文信息學報;2005年03期
4 鄭實福,劉挺,秦兵,李生;自動問答綜述[J];中文信息學報;2002年06期
相關博士學位論文 前1條
1 戶保田;基于深度神經(jīng)網(wǎng)絡的文本表示及其應用[D];哈爾濱工業(yè)大學;2016年
,本文編號:1652198
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1652198.html
最近更新
教材專著