面向歷史科目的問(wèn)答技術(shù)研究
發(fā)布時(shí)間:2018-08-27 12:06
【摘要】:近年來(lái),人工智能在許多方面取得了突破性的成就,因此越來(lái)越受到人們的關(guān)注。自動(dòng)問(wèn)答系統(tǒng)就是人工智能中的一個(gè)很重要的分支,也是自然語(yǔ)言處理領(lǐng)域中的一個(gè)值得長(zhǎng)期研究的目標(biāo),F(xiàn)有的問(wèn)答系統(tǒng)通常可以分為基于檢索的問(wèn)答系統(tǒng)和基于知識(shí)庫(kù)的問(wèn)答系統(tǒng),這兩種系統(tǒng)在回答問(wèn)題時(shí)都需要事先儲(chǔ)備一些相關(guān)的背景知識(shí),只不過(guò)知識(shí)庫(kù)中存儲(chǔ)的都是結(jié)構(gòu)化的易于理解的數(shù)據(jù),而基于檢索的問(wèn)答系統(tǒng)通常包含大量的互聯(lián)網(wǎng)文本,因此在回答問(wèn)題時(shí)都會(huì)通過(guò)相關(guān)的查詢產(chǎn)生若干的候選答案,接下來(lái)就需要計(jì)算每個(gè)候選答案與問(wèn)題的相關(guān)程度,從而去除不相關(guān)的候選答案,最后得到問(wèn)題的最佳答案。本文主要研究了面向歷史科目的相關(guān)問(wèn)答技術(shù),包括問(wèn)題分類(lèi)、問(wèn)題成分抽取、以及對(duì)問(wèn)題候選答案的置信度排序問(wèn)題。在得到一個(gè)問(wèn)題之后,首先需要對(duì)問(wèn)題進(jìn)行分析以構(gòu)造相關(guān)的查詢,然后經(jīng)過(guò)查詢得到若干的候選段落,最后對(duì)候選段落中的句子進(jìn)行置信度排序從而得到簡(jiǎn)短、準(zhǔn)確的問(wèn)題答案。本文嘗試將深度學(xué)習(xí)的方法應(yīng)用到問(wèn)題分類(lèi)、問(wèn)題成分抽取和答案置信度排序中,具體研究?jī)?nèi)容如下:1.本文建立了針對(duì)歷史科目的問(wèn)題分類(lèi)語(yǔ)料集和問(wèn)題成分抽取語(yǔ)料集,將歷史材料題進(jìn)行分類(lèi)并識(shí)別出問(wèn)題中的關(guān)鍵要素。另外,本文建立了用于歷史科目答案置信度排序的數(shù)據(jù)集。2.構(gòu)建了基于深度學(xué)習(xí)的問(wèn)題分類(lèi)模型,并且使用了傳統(tǒng)方法SVM與其進(jìn)行對(duì)比。實(shí)驗(yàn)結(jié)果表明,深度學(xué)習(xí)法明顯優(yōu)于傳統(tǒng)的方法,其中CNN模型取得了最佳的效果,達(dá)到了91.08%的Micro-F1值和86.80%的Macro-F1值。3.使用CRF模型和LSTM-CRF模型分別對(duì)問(wèn)題進(jìn)行了問(wèn)題成分抽取實(shí)驗(yàn)。實(shí)驗(yàn)結(jié)果表明,傳統(tǒng)的CRF模型在小規(guī)模語(yǔ)料的情況下效果是優(yōu)于深度學(xué)習(xí)方法的,達(dá)到了88.51%的F1值。4.構(gòu)建了基于深度學(xué)習(xí)的答案置信度排序算法,討論了在使用CNN、LSTM在答案選擇上的效果,實(shí)驗(yàn)表明,LSTM模型優(yōu)于CNN模型,并且本文基于不同置信度計(jì)算方法以及使用不同的損失函數(shù)對(duì)答案置信度計(jì)算的影響進(jìn)行了討論,并進(jìn)一步提出了調(diào)和余弦相似度和歐幾里得距離的置信度計(jì)算方法,實(shí)驗(yàn)結(jié)果表明,使用調(diào)和后的置信度計(jì)算方法和合頁(yè)損失函數(shù)取得了最佳的效果,其中MAP和MRR值分別為0.4320和0.6120。
[Abstract]:In recent years, artificial intelligence has made breakthrough achievements in many aspects, so people pay more and more attention to it. Automatic question answering system is a very important branch of artificial intelligence, and it is also a goal worthy of long-term study in the field of natural language processing. The existing question-and-answer systems are usually divided into search-based question-and-answer systems and knowledge-based question-and-answer systems, both of which require prior storage of relevant background knowledge when answering questions. However, all the data stored in the knowledge base is structured and easy to understand, and the search-based question-and-answer system usually contains a large amount of Internet text. Therefore, when answering questions, a number of candidate answers are generated through related queries. Next, we need to calculate the correlation between each candidate and the question, so as to remove the irrelevant candidate answer and finally get the best answer to the question. This paper mainly studies the question and answer techniques for historical subjects, including question classification, problem component extraction, and confidence ranking of candidate answers. After getting a question, we first need to analyze the problem to construct the related query, then we can get a number of candidate paragraphs through the query, and finally, we can sort the sentences in the candidate paragraphs to get a brief conclusion. An accurate answer to a question. This paper attempts to apply the method of in-depth learning to the classification of problems, the extraction of problem components and the ranking of confidence in the answers. The specific contents of this study are as follows: 1. In this paper, the problem classification corpus and the problem component extraction data set are established, and the historical material questions are classified and the key elements of the problem are identified. In addition, this paper establishes a dataset. 2. 2. A problem classification model based on deep learning is constructed, and the traditional method SVM is used to compare it. The experimental results show that the depth learning method is superior to the traditional method, and the CNN model has the best effect, reaching 91.08% Micro-F1 value and 86.80% Macro-F1 value .3. CRF model and LSTM-CRF model are used to extract the components of the problem. The experimental results show that the traditional CRF model is superior to the depth learning method in the case of small data, reaching 88.51% of F1 value. 4. An answer confidence sorting algorithm based on deep learning is constructed, and the effect of using CNN,LSTM in answer selection is discussed. The experiment shows that the LSTM model is superior to the CNN model. Based on different confidence calculation methods and different loss functions, this paper discusses the influence of different loss functions on the calculation of the confidence degree of the answer, and further proposes a method to calculate the confidence degree of harmonic cosine similarity and Euclidean distance. The experimental results show that the best results are obtained by using the concatenated confidence calculation method and the hinge loss function. The MAP and MRR values are 0.4320 and 0.6120 respectively.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類(lèi)號(hào)】:TP391.1;TP18
[Abstract]:In recent years, artificial intelligence has made breakthrough achievements in many aspects, so people pay more and more attention to it. Automatic question answering system is a very important branch of artificial intelligence, and it is also a goal worthy of long-term study in the field of natural language processing. The existing question-and-answer systems are usually divided into search-based question-and-answer systems and knowledge-based question-and-answer systems, both of which require prior storage of relevant background knowledge when answering questions. However, all the data stored in the knowledge base is structured and easy to understand, and the search-based question-and-answer system usually contains a large amount of Internet text. Therefore, when answering questions, a number of candidate answers are generated through related queries. Next, we need to calculate the correlation between each candidate and the question, so as to remove the irrelevant candidate answer and finally get the best answer to the question. This paper mainly studies the question and answer techniques for historical subjects, including question classification, problem component extraction, and confidence ranking of candidate answers. After getting a question, we first need to analyze the problem to construct the related query, then we can get a number of candidate paragraphs through the query, and finally, we can sort the sentences in the candidate paragraphs to get a brief conclusion. An accurate answer to a question. This paper attempts to apply the method of in-depth learning to the classification of problems, the extraction of problem components and the ranking of confidence in the answers. The specific contents of this study are as follows: 1. In this paper, the problem classification corpus and the problem component extraction data set are established, and the historical material questions are classified and the key elements of the problem are identified. In addition, this paper establishes a dataset. 2. 2. A problem classification model based on deep learning is constructed, and the traditional method SVM is used to compare it. The experimental results show that the depth learning method is superior to the traditional method, and the CNN model has the best effect, reaching 91.08% Micro-F1 value and 86.80% Macro-F1 value .3. CRF model and LSTM-CRF model are used to extract the components of the problem. The experimental results show that the traditional CRF model is superior to the depth learning method in the case of small data, reaching 88.51% of F1 value. 4. An answer confidence sorting algorithm based on deep learning is constructed, and the effect of using CNN,LSTM in answer selection is discussed. The experiment shows that the LSTM model is superior to the CNN model. Based on different confidence calculation methods and different loss functions, this paper discusses the influence of different loss functions on the calculation of the confidence degree of the answer, and further proposes a method to calculate the confidence degree of harmonic cosine similarity and Euclidean distance. The experimental results show that the best results are obtained by using the concatenated confidence calculation method and the hinge loss function. The MAP and MRR values are 0.4320 and 0.6120 respectively.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類(lèi)號(hào)】:TP391.1;TP18
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 劉小明;樊孝忠;李方方;;一種結(jié)合本體和焦點(diǎn)的問(wèn)題分類(lèi)方法[J];北京理工大學(xué)學(xué)報(bào);2012年05期
2 槰起;;不一定,
本文編號(hào):2207227
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2207227.html
最近更新
教材專(zhuān)著