基于實(shí)體類(lèi)百科知識(shí)的問(wèn)句自動(dòng)生成系統(tǒng)
發(fā)布時(shí)間:2018-05-04 01:15
本文選題:交互式問(wèn)答 + 問(wèn)句生成。 參考:《哈爾濱工業(yè)大學(xué)》2012年碩士論文
【摘要】:隨著網(wǎng)絡(luò)信息的爆炸式的增長(zhǎng),各種信息充斥著整個(gè)網(wǎng)絡(luò)環(huán)境。人們現(xiàn)在已經(jīng)習(xí)慣于去網(wǎng)絡(luò)上搜尋一些解決問(wèn)題的方法。當(dāng)用戶(hù)并不是十分熟悉一些搜索技巧的時(shí)候,他們往往需要花費(fèi)很多的時(shí)間去篩選搜索引擎返回的結(jié)果。交互式問(wèn)答系統(tǒng)的誕生有效的解決了前面提到的信息煩雜的問(wèn)題。問(wèn)答系統(tǒng)采用自然語(yǔ)言處理的方法將用戶(hù)提交的問(wèn)題進(jìn)行分析,獲取相關(guān)答案然后返回給用戶(hù)。 問(wèn)句自動(dòng)生成將會(huì)在缺少人機(jī)交互的情況下為交互式問(wèn)答系統(tǒng)提供問(wèn)答對(duì)。這些問(wèn)答對(duì)可以根據(jù)系統(tǒng)需要限定在某一領(lǐng)域內(nèi)存在也可以作為通用領(lǐng)域問(wèn)答對(duì)。目前針對(duì)英文問(wèn)句自動(dòng)生成技術(shù)已經(jīng)有了很大的發(fā)展,這些技術(shù)已經(jīng)被應(yīng)用到問(wèn)答系統(tǒng),對(duì)話(huà)系統(tǒng)以及教學(xué)系統(tǒng)等。中文問(wèn)句自動(dòng)生成的研究才剛剛起步,有很多的問(wèn)題需要科研人員來(lái)解決。本課題是針對(duì)中文問(wèn)答系統(tǒng)語(yǔ)料庫(kù)不完善這個(gè)問(wèn)題,提出通過(guò)自動(dòng)的生成中文問(wèn)答對(duì)來(lái)對(duì)問(wèn)答系統(tǒng)語(yǔ)料庫(kù)進(jìn)行補(bǔ)充。 本課題研究?jī)?nèi)容如下所示: 1.中文問(wèn)句自動(dòng)生成系統(tǒng) 當(dāng)前,問(wèn)句自動(dòng)生成系統(tǒng)不能像人那樣直接理解一句話(huà)的意思。因此問(wèn)句生成前的信息預(yù)處理是每個(gè)問(wèn)題生成系統(tǒng)所必需進(jìn)行的。本課題采取分布式設(shè)計(jì),將中文信息提取分成兩大部分共七類(lèi)的信息由不同的功能單元機(jī)進(jìn)行處理,最終處理后的結(jié)果返回給問(wèn)句生成系統(tǒng)。本課題設(shè)計(jì)了一種基于句法信息與句式信息相結(jié)合的問(wèn)句生成算法,根據(jù)他們的信息生成特殊疑問(wèn)句或者是因果關(guān)系疑問(wèn)句。 2.生成問(wèn)句的自動(dòng)分類(lèi) 本課題提出一種根據(jù)對(duì)命名實(shí)體分類(lèi)與部分模板匹配的算法,將生成6類(lèi)問(wèn)句。這六類(lèi)的問(wèn)句分別是人名類(lèi)問(wèn)句,地名類(lèi)問(wèn)句,,時(shí)間表達(dá)式類(lèi)問(wèn)句,機(jī)構(gòu)名稱(chēng)類(lèi)問(wèn)句,定義類(lèi)問(wèn)句和因果關(guān)系類(lèi)問(wèn)句。 3.系統(tǒng)的評(píng)測(cè)與改進(jìn) 英文問(wèn)題生成系統(tǒng)定義了一系列的評(píng)測(cè)標(biāo)準(zhǔn)。本課題將借鑒其中某些標(biāo)準(zhǔn)來(lái)對(duì)系統(tǒng)進(jìn)行評(píng)測(cè)。同時(shí)邀請(qǐng)部分用戶(hù)參與系統(tǒng)測(cè)試,根據(jù)他們的反饋情況有針對(duì)性的進(jìn)行系統(tǒng)的完善和補(bǔ)充。
[Abstract]:With the explosive growth of network information, all kinds of information are flooded with the whole network environment. People are now used to searching the Internet for solutions to problems. When users are not very familiar with some search techniques, they often spend a lot of time to filter the results returned by search engines. The birth of interactive Q & A system effectively solves the problem of information complexity mentioned above. The question answering system uses natural language processing method to analyze the questions submitted by the user, obtain the relevant answers and return them to the user. Automatic question generation will provide a question-answer pair for an interactive question-answering system in the absence of human-computer interaction. These question-and-answer pairs can be limited to exist in a domain according to the system needs or can be used as general domain question-and-answer pairs. At present, there has been a great development in automatic generation of English question sentence, which has been applied to question answering system, dialogue system and teaching system. The study of automatic generation of Chinese questions is just beginning, and many problems need to be solved by researchers. In order to solve the problem that the corpus of Chinese question answering system is not perfect, this paper proposes to supplement the corpus of question and answer system by generating Chinese question and answer pairs automatically. The contents of this study are as follows: 1. Automatic Generation system of Chinese question sentences At present, the automatic question generation system can not understand the meaning of a sentence as directly as a person. Therefore, the information preprocessing before question generation is necessary for every problem generation system. In this paper, the distributed design is adopted. The Chinese information extraction is divided into two parts and seven types of information, which are processed by different functional unit machines, and the final results are returned to the question generation system. In this paper, a question generation algorithm based on syntactic information and sentence information is designed to generate special questions or causality questions according to their information. 2. Automatic Classification of generated questions This paper presents an algorithm for matching named entities with partial templates, which will generate 6 kinds of question sentences. The six types of questions are named questions, toponymic questions, time expression questions, agency name questions, definition questions and causality questions. 3. Evaluation and improvement of system The English problem generation system defines a series of evaluation criteria. This subject will draw lessons from some of the standards to evaluate the system. At the same time, some users are invited to participate in the system testing, according to their feedback to improve and supplement the system.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類(lèi)號(hào)】:TP391.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前7條
1 夏天,樊孝忠,劉林,駱正華;基于ALICE的漢語(yǔ)自然語(yǔ)言接口[J];北京理工大學(xué)學(xué)報(bào);2004年10期
2 費(fèi)洪曉,康松林,朱小娟,謝文彪;基于詞頻統(tǒng)計(jì)的中文分詞的研究[J];計(jì)算機(jī)工程與應(yīng)用;2005年07期
3 胡宇舟;王雷;顧學(xué)道;;基于多模板隱馬爾可夫模型的文本信息抽取算法[J];計(jì)算機(jī)應(yīng)用;2008年03期
4 于海濱;秦兵;劉挺;郎君;;命名實(shí)體識(shí)別和指代消解在文摘系統(tǒng)中的應(yīng)用[J];計(jì)算機(jī)應(yīng)用研究;2006年04期
5 辛霄;范士喜;王軒;王曉龍;;基于最大熵的依存句法分析[J];中文信息學(xué)報(bào);2009年02期
6 劉挺,吳巖,王開(kāi)鑄;基于信息抽取和文本生成的自動(dòng)文摘系統(tǒng)設(shè)計(jì)[J];情報(bào)學(xué)報(bào);1997年S1期
7 俞鴻魁;張華平;劉群;呂學(xué)強(qiáng);施水才;;基于層疊隱馬爾可夫模型的中文命名實(shí)體識(shí)別[J];通信學(xué)報(bào);2006年02期
本文編號(hào):1840929
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1840929.html
最近更新
教材專(zhuān)著