交互式問(wèn)答中的語(yǔ)句關(guān)系識(shí)別方法
發(fā)布時(shí)間:2018-03-23 06:05
本文選題:問(wèn)答匹配關(guān)系 切入點(diǎn):補(bǔ)充關(guān)系 出處:《哈爾濱工業(yè)大學(xué)》2017年碩士論文 論文類型:學(xué)位論文
【摘要】:隨著互聯(lián)網(wǎng)技術(shù)的發(fā)展和信息量的迅速增長(zhǎng),人們迫切需要一種準(zhǔn)確、高效的信息獲取方式。從搜索引擎到智能交互式問(wèn)答系統(tǒng),信息的獲取方式越來(lái)越接近于自然交互。一方面因?yàn)楹A繑?shù)據(jù)的出現(xiàn),另一方面因?yàn)闄C(jī)器學(xué)習(xí)和自然語(yǔ)言處理等技術(shù)的長(zhǎng)足進(jìn)步,問(wèn)答系統(tǒng)進(jìn)入了面向各領(lǐng)域、基于自由文本和異構(gòu)信息、基于生成式的智能交互式問(wèn)答發(fā)展階段。與搜索引擎不同的是,用戶無(wú)需在多條候選文檔中選擇,問(wèn)答系統(tǒng)能更好的理解以自然語(yǔ)言形式描述的問(wèn)題,同時(shí)返回簡(jiǎn)潔精確的答案。隨著Siri和Watson的成功問(wèn)世,智能交互式問(wèn)答系統(tǒng)成為了近年來(lái)的一個(gè)研究熱點(diǎn),在商業(yè)領(lǐng)域也越來(lái)越具有代替人工客服的潛力。然而,要構(gòu)建更加智能的交互式問(wèn)答系統(tǒng),從已有的客服日志中學(xué)習(xí)知識(shí)就顯得非常重要,而如何從復(fù)雜的交互式問(wèn)答客服日志中識(shí)別問(wèn)句與答句之間的匹配關(guān)系以及連續(xù)語(yǔ)句之間的補(bǔ)充關(guān)系則成為了構(gòu)建學(xué)習(xí)系統(tǒng)的關(guān)鍵。本文主要針對(duì)交互式問(wèn)答中的語(yǔ)句匹配關(guān)系識(shí)別和補(bǔ)充關(guān)系識(shí)別進(jìn)行了研究。針對(duì)客戶問(wèn)句與客服回答之間的匹配問(wèn)題,本文分別構(gòu)建了基于CNN的語(yǔ)義匹配模型和基于RNN的生成模型,模型的輸入層是句子的詞向量矩陣,輸出層是問(wèn)答匹配的置信度。分別在Semeval-2016社區(qū)問(wèn)答數(shù)據(jù)和在線客服對(duì)話數(shù)據(jù)上,進(jìn)行了不同模型的性能對(duì)比。同時(shí)對(duì)問(wèn)句的完整性、生成模型的不同結(jié)構(gòu)、閾值選擇以及客服數(shù)據(jù)的抽取方式等進(jìn)行了對(duì)比實(shí)驗(yàn)分析。實(shí)驗(yàn)結(jié)果表明,在社區(qū)問(wèn)答數(shù)據(jù)中,本文中基于CNN的匹配模型優(yōu)于RNN生成模型;在客服對(duì)話數(shù)據(jù)中,基于RNN的序列學(xué)習(xí)模型能夠更好的學(xué)習(xí)到場(chǎng)景對(duì)話中的上下文信息。在基于每輪對(duì)話且問(wèn)句完整的數(shù)據(jù)上,MAP達(dá)到了84.41%。針對(duì)交互式問(wèn)答中連續(xù)語(yǔ)句之間存在的上下文相關(guān)聯(lián)的潛在語(yǔ)義補(bǔ)充關(guān)系,本文研究了句子補(bǔ)充關(guān)系的識(shí)別。在深度模型上,構(gòu)建了并行CNN和串聯(lián)LSTM對(duì)句子對(duì)進(jìn)行抽象語(yǔ)義特征提取和建模。分別采用支持向量機(jī)、基于CNN的模型和基于RNN的模型,對(duì)句子對(duì)的補(bǔ)充關(guān)系進(jìn)行分類。實(shí)驗(yàn)結(jié)果表明,基于CNN的識(shí)別方法優(yōu)于其他對(duì)比方法,其F1值達(dá)到了67.8%。最終,將補(bǔ)充關(guān)系識(shí)別和匹配關(guān)系識(shí)別相結(jié)合應(yīng)用于交互式問(wèn)答語(yǔ)義匹配。
[Abstract]:With the development of Internet technology and the rapid growth of information, people urgently need an accurate and efficient way to obtain information, from search engine to intelligent interactive question answering system. On the one hand, due to the emergence of massive data, on the other hand, due to the rapid progress of machine learning and natural language processing, the question answering system has entered various fields. Based on free text and heterogeneous information, intelligent interactive question-and-answer based on generative stage. Unlike search engines, users do not have to choose from multiple candidate documents. With the success of Siri and Watson, the intelligent interactive question answering system has become a research hotspot in recent years. There is also a growing potential in the business world to replace manual customer service. However, it is important to learn from existing customer service logs in order to build a more intelligent interactive question-and-answer system. However, how to identify the matching relationship between question and answer sentences and the complementary relationship between continuous sentences from the complex interactive Q & A log becomes the key to construct a learning system. To solve the matching problem between customer question and customer service, In this paper, the semantic matching model based on CNN and the generating model based on RNN are constructed respectively. The input layer of the model is the word vector matrix of sentence, the confidence of question and answer matching is at the output level, respectively on Semeval-2016 community question and answer data and online customer service conversation data. At the same time, the integrity of question sentence, the different structure of generating model, the selection of threshold value and the way of extracting customer service data are compared and analyzed. The experimental results show that, in the community question and answer data, In this paper, the matching model based on CNN is superior to the RNN generation model. The sequence learning model based on RNN can better learn the context information in the scene dialogue. The map reached 84.41 on the data based on each round of dialogue and question sentence integrity. The underlying semantic complementary relationship associated with the text, In this paper, the recognition of sentence complement relationship is studied. In depth model, parallel CNN and tandem LSTM are constructed to extract and model the abstract semantic features of sentence pairs. Support vector machine (SVM), CNN based model and RNN based model are used, respectively. The experimental results show that the recognition method based on CNN is superior to other comparison methods, and its F1 value reaches 67.8%. This paper applies complementary relationship recognition and matching relationship recognition to interactive question and answer semantic matching.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP391.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前4條
1 王寶勛;劉秉權(quán);孫承杰;王曉龍;;網(wǎng)絡(luò)問(wèn)答資源挖掘綜述[J];智能計(jì)算機(jī)與應(yīng)用;2012年06期
2 董燕舉;蔡?hào)|風(fēng);白宇;;面向事實(shí)性問(wèn)題的答案選擇技術(shù)研究綜述[J];中文信息學(xué)報(bào);2009年01期
3 吳友政,趙軍,段湘煜,徐波;問(wèn)答式檢索技術(shù)及評(píng)測(cè)研究綜述[J];中文信息學(xué)報(bào);2005年03期
4 鄭實(shí)福,劉挺,秦兵,李生;自動(dòng)問(wèn)答綜述[J];中文信息學(xué)報(bào);2002年06期
相關(guān)博士學(xué)位論文 前1條
1 戶保田;基于深度神經(jīng)網(wǎng)絡(luò)的文本表示及其應(yīng)用[D];哈爾濱工業(yè)大學(xué);2016年
,本文編號(hào):1652198
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1652198.html
最近更新
教材專著