基于領(lǐng)域知識(shí)的自動(dòng)答題方法研究
發(fā)布時(shí)間:2018-05-07 11:23
本文選題:自動(dòng)答題 + 翻譯模型 ; 參考:《哈爾濱工業(yè)大學(xué)》2016年碩士論文
【摘要】:在移動(dòng)互聯(lián)網(wǎng)廣泛普及的年代,人們獲取信息的方式越來越便捷,對信息的需求也越來越大。為了滿足不同層次的人們對于不同領(lǐng)域的信息需求,搜索引擎在移動(dòng)互聯(lián)網(wǎng)中面臨著巨大的挑戰(zhàn)。問答技術(shù)的日漸完善在很大程度上克服了搜索引擎顯現(xiàn)出的弊端,使人們擁有了更加自然的人機(jī)交互方式。問答系統(tǒng)可以較為準(zhǔn)確地理解人們自然語言形式的問題,并利用知識(shí)庫檢索即時(shí)地返回言簡意賅的答案,有效地滿足了人們的需求。隨著人工智能、自然語言處理等相關(guān)技術(shù)的進(jìn)步,針對不同的數(shù)據(jù)形態(tài)也衍生出了不同種類的問答系統(tǒng)。近幾年,國內(nèi)外諸多科研機(jī)構(gòu)開始致力于類人智能技術(shù)的研究,將問答相關(guān)技術(shù)應(yīng)用到考試領(lǐng)域。本課題主要面向我國高考文綜試題歷史部分,利用自然語言處理、問答系統(tǒng)等技術(shù)搭建一個(gè)能夠求解高考?xì)v史簡答題的自動(dòng)答題系統(tǒng)。本文的主要研究內(nèi)容包括:數(shù)據(jù)預(yù)處理與平臺(tái)搭建。本文對歷年高考?xì)v史真題進(jìn)行了抽樣分析,對題目類型和解題難點(diǎn)進(jìn)行了歸納總結(jié);依據(jù)各類題目的主要問題,完成了領(lǐng)域知識(shí)庫的數(shù)據(jù)采集和存儲(chǔ),搭建了歷史檢索系統(tǒng),確保了答題系統(tǒng)的正常運(yùn)行;針對歷史題目中可能存在文言材料的問題,通過互聯(lián)網(wǎng)渠道收集了一定規(guī)模的平行語料,完成了文言文判別模型和文言文翻譯模型的訓(xùn)練;谥R(shí)庫的候選答案發(fā)現(xiàn)。為了能夠準(zhǔn)確地從知識(shí)庫中得到與題目相關(guān)的文檔,本文在對題目進(jìn)行關(guān)鍵詞提取、信息檢索、置信度計(jì)算等傳統(tǒng)步驟之后,針對歷史簡答題的特殊性,嘗試了基于卷積神經(jīng)網(wǎng)絡(luò)的問答匹配方法,將候選答案發(fā)現(xiàn)問題轉(zhuǎn)化為序列預(yù)測問題,通過卷積神經(jīng)網(wǎng)絡(luò)模型做到更深層次的匹配;诙辔臋n的答案生成。利用知識(shí)庫檢索得到了包含答案要點(diǎn)的候選文檔集合,為了從中提出了簡潔、準(zhǔn)確、符合題意的答案,本文借鑒了多文檔摘要的算法思想,通過對文檔集合中語句進(jìn)行文本聚類生成多個(gè)簇,再利用多語句壓縮方法對每個(gè)簇進(jìn)行信息抽取,生成題目答案。為了便于對系統(tǒng)性能進(jìn)行實(shí)驗(yàn)分析,本文建立了統(tǒng)一的人工評(píng)分標(biāo)準(zhǔn),在歷年高考真題上進(jìn)行測試,證明了系統(tǒng)的有效性。
[Abstract]:In the era of widespread mobile Internet, people get information more and more convenient, and the demand for information is growing. In order to meet the information needs of people at different levels, search engines are facing great challenges in mobile Internet. The improvement of Q & A technology to a great extent overcomes the disadvantages of search engine and makes people have more natural human-computer interaction. The question answering system can understand the question of people's natural language form more accurately, and use the knowledge base to retrieve the concise and concise answers in real time, which can meet people's demand effectively. With the development of artificial intelligence, natural language processing and other related technologies, different kinds of Q & A systems are derived for different data forms. In recent years, many scientific research institutions at home and abroad began to devote themselves to the research of humanoid intelligence technology, and applied the question and answer related technology to the field of examination. This paper mainly aims at the history part of the comprehensive examination of the college entrance examination in our country. It uses natural language processing, question answering system and other techniques to build an automatic answer system which can solve the history brief questions of the college entrance examination. The main research contents of this paper include: data preprocessing and platform building. This paper has carried on the sampling analysis to the history question of the college entrance examination over the years, has summarized the question type and the difficult problem, has completed the domain knowledge base data collection and the storage according to each kind of topic main question, has set up the history retrieval system. To ensure the normal operation of the answer system, to solve the problem of classical Chinese materials, we collect a certain scale of parallel corpus through the Internet channel, and complete the training of classical Chinese discriminant model and classical Chinese translation model. Candidate answer discovery based on knowledge base. In order to get the relevant documents from the knowledge base accurately, after the traditional steps such as keyword extraction, information retrieval, confidence calculation and so on, this paper aims at the particularity of the history brief answer. In this paper, a question and answer matching method based on convolution neural network is tried. The problem of finding candidate answers is transformed into a sequence prediction problem, and a deeper matching is achieved through the convolution neural network model. Answer generation based on multiple documents. The candidate document set containing the key points of the answer is obtained by searching the knowledge base. In order to put forward a succinct, accurate and consistent answer to the question meaning, this paper draws lessons from the algorithm of multi-document summary. Several clusters are generated by text clustering of statements in the document set, and then the information of each cluster is extracted by the method of multi-sentence compression to generate the answer to the questions. In order to carry on the experiment analysis to the system performance, this paper establishes the unified manual mark standard, carries on the test in the past years college entrance examination real question, has proved the system to be effective.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2016
【分類號(hào)】:TP391.1
,
本文編號(hào):1856725
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1856725.html
最近更新
教材專著