基于實體類百科知識的問句自動生成系統(tǒng)
發(fā)布時間:2018-05-04 01:15
本文選題:交互式問答 + 問句生成。 參考:《哈爾濱工業(yè)大學》2012年碩士論文
【摘要】:隨著網絡信息的爆炸式的增長,各種信息充斥著整個網絡環(huán)境。人們現在已經習慣于去網絡上搜尋一些解決問題的方法。當用戶并不是十分熟悉一些搜索技巧的時候,他們往往需要花費很多的時間去篩選搜索引擎返回的結果。交互式問答系統(tǒng)的誕生有效的解決了前面提到的信息煩雜的問題。問答系統(tǒng)采用自然語言處理的方法將用戶提交的問題進行分析,獲取相關答案然后返回給用戶。 問句自動生成將會在缺少人機交互的情況下為交互式問答系統(tǒng)提供問答對。這些問答對可以根據系統(tǒng)需要限定在某一領域內存在也可以作為通用領域問答對。目前針對英文問句自動生成技術已經有了很大的發(fā)展,這些技術已經被應用到問答系統(tǒng),對話系統(tǒng)以及教學系統(tǒng)等。中文問句自動生成的研究才剛剛起步,有很多的問題需要科研人員來解決。本課題是針對中文問答系統(tǒng)語料庫不完善這個問題,提出通過自動的生成中文問答對來對問答系統(tǒng)語料庫進行補充。 本課題研究內容如下所示: 1.中文問句自動生成系統(tǒng) 當前,問句自動生成系統(tǒng)不能像人那樣直接理解一句話的意思。因此問句生成前的信息預處理是每個問題生成系統(tǒng)所必需進行的。本課題采取分布式設計,將中文信息提取分成兩大部分共七類的信息由不同的功能單元機進行處理,最終處理后的結果返回給問句生成系統(tǒng)。本課題設計了一種基于句法信息與句式信息相結合的問句生成算法,根據他們的信息生成特殊疑問句或者是因果關系疑問句。 2.生成問句的自動分類 本課題提出一種根據對命名實體分類與部分模板匹配的算法,將生成6類問句。這六類的問句分別是人名類問句,地名類問句,,時間表達式類問句,機構名稱類問句,定義類問句和因果關系類問句。 3.系統(tǒng)的評測與改進 英文問題生成系統(tǒng)定義了一系列的評測標準。本課題將借鑒其中某些標準來對系統(tǒng)進行評測。同時邀請部分用戶參與系統(tǒng)測試,根據他們的反饋情況有針對性的進行系統(tǒng)的完善和補充。
[Abstract]:With the explosive growth of network information, all kinds of information are flooded with the whole network environment. People are now used to searching the Internet for solutions to problems. When users are not very familiar with some search techniques, they often spend a lot of time to filter the results returned by search engines. The birth of interactive Q & A system effectively solves the problem of information complexity mentioned above. The question answering system uses natural language processing method to analyze the questions submitted by the user, obtain the relevant answers and return them to the user. Automatic question generation will provide a question-answer pair for an interactive question-answering system in the absence of human-computer interaction. These question-and-answer pairs can be limited to exist in a domain according to the system needs or can be used as general domain question-and-answer pairs. At present, there has been a great development in automatic generation of English question sentence, which has been applied to question answering system, dialogue system and teaching system. The study of automatic generation of Chinese questions is just beginning, and many problems need to be solved by researchers. In order to solve the problem that the corpus of Chinese question answering system is not perfect, this paper proposes to supplement the corpus of question and answer system by generating Chinese question and answer pairs automatically. The contents of this study are as follows: 1. Automatic Generation system of Chinese question sentences At present, the automatic question generation system can not understand the meaning of a sentence as directly as a person. Therefore, the information preprocessing before question generation is necessary for every problem generation system. In this paper, the distributed design is adopted. The Chinese information extraction is divided into two parts and seven types of information, which are processed by different functional unit machines, and the final results are returned to the question generation system. In this paper, a question generation algorithm based on syntactic information and sentence information is designed to generate special questions or causality questions according to their information. 2. Automatic Classification of generated questions This paper presents an algorithm for matching named entities with partial templates, which will generate 6 kinds of question sentences. The six types of questions are named questions, toponymic questions, time expression questions, agency name questions, definition questions and causality questions. 3. Evaluation and improvement of system The English problem generation system defines a series of evaluation criteria. This subject will draw lessons from some of the standards to evaluate the system. At the same time, some users are invited to participate in the system testing, according to their feedback to improve and supplement the system.
【學位授予單位】:哈爾濱工業(yè)大學
【學位級別】:碩士
【學位授予年份】:2012
【分類號】:TP391.3
【參考文獻】
相關期刊論文 前7條
1 夏天,樊孝忠,劉林,駱正華;基于ALICE的漢語自然語言接口[J];北京理工大學學報;2004年10期
2 費洪曉,康松林,朱小娟,謝文彪;基于詞頻統(tǒng)計的中文分詞的研究[J];計算機工程與應用;2005年07期
3 胡宇舟;王雷;顧學道;;基于多模板隱馬爾可夫模型的文本信息抽取算法[J];計算機應用;2008年03期
4 于海濱;秦兵;劉挺;郎君;;命名實體識別和指代消解在文摘系統(tǒng)中的應用[J];計算機應用研究;2006年04期
5 辛霄;范士喜;王軒;王曉龍;;基于最大熵的依存句法分析[J];中文信息學報;2009年02期
6 劉挺,吳巖,王開鑄;基于信息抽取和文本生成的自動文摘系統(tǒng)設計[J];情報學報;1997年S1期
7 俞鴻魁;張華平;劉群;呂學強;施水才;;基于層疊隱馬爾可夫模型的中文命名實體識別[J];通信學報;2006年02期
本文編號:1840929
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1840929.html
教材專著