維基百科在IR4QA系統(tǒng)中的應(yīng)用研究

發(fā)布時(shí)間：2018-06-29 00:07

本文選題：問答系統(tǒng) + IR4QA��；參考：《武漢科技大學(xué)》2012年碩士論文

【摘要】：問答系統(tǒng)是新一代智能搜索引擎，它允許用戶以自然語言提問，并能夠向用戶返回準(zhǔn)確的答案。所以，與傳統(tǒng)的搜索引擎相比，問答系統(tǒng)能更好的滿足用戶的查詢要求，更準(zhǔn)確地檢索出用戶所需要的答案。本文主要基于NTCIR8中所做的工作，研究的是問題理解和信息檢索這兩個(gè)中文問答系統(tǒng)中的主要部分，即IR4QA階段的研究，并最終實(shí)現(xiàn)了這個(gè)IR4QA系統(tǒng)。問題理解階段是所有涉及到自然語言接口系統(tǒng)的研究內(nèi)容，是問答系統(tǒng)開始執(zhí)行的第一個(gè)階段，這個(gè)階段的分析結(jié)果對后面的幾個(gè)階段的處理有著重大的影響；信息檢索階段在問答系統(tǒng)中處于中間的執(zhí)行階段，它的分析結(jié)果將會極大地影響后續(xù)模塊的結(jié)果質(zhì)量。本文通過比較和分析一般問答系統(tǒng)中這兩個(gè)階段目前存在的問題，找出更有效的處理方法應(yīng)用在我們的系統(tǒng)中。本文在前人的研究基礎(chǔ)上作了如下的一些工作： (1)整理并分析國內(nèi)外有關(guān)自動問答系統(tǒng)與搜索引擎技術(shù)的研究現(xiàn)狀，結(jié)合兩種系統(tǒng)的長處，對于當(dāng)前使用者在運(yùn)用搜索引擎時(shí)出現(xiàn)的搜索結(jié)果冗雜、花費(fèi)時(shí)間長、結(jié)果準(zhǔn)確度不高等一些問題，提出了將維基百科應(yīng)用于自動問答系統(tǒng)的方法，即基于維基百科的IR4QA系統(tǒng)，設(shè)計(jì)并實(shí)現(xiàn)了該系統(tǒng)。 (2)通過分析系統(tǒng)最終達(dá)到的效果，本文在系統(tǒng)設(shè)計(jì)初期就制定了一系列切實(shí)可行的方法。以這些方法為基礎(chǔ)，同時(shí)采用分層以及模塊化的設(shè)計(jì)思想，確定了系統(tǒng)的設(shè)計(jì)原則，并將系統(tǒng)分為索引生成模塊、問題分析模塊、查詢擴(kuò)展模塊、文檔檢索模塊和文檔重排模塊。 (3)研究了系統(tǒng)中涉及到的一些關(guān)鍵技術(shù)，對實(shí)現(xiàn)過程中遇到的難點(diǎn)做了理論和技術(shù)的積累，并提出了切實(shí)可行的解決方案。 (4)在問題分類時(shí)，結(jié)合問題集中問題的特點(diǎn)，并考慮到漢語語法和語義分析的龐大工作任務(wù)，提高系統(tǒng)的質(zhì)量，系統(tǒng)沒有采用一般用在英文問答系統(tǒng)里面的機(jī)器學(xué)習(xí)的問題分類方法，而是利用啟發(fā)式的規(guī)則，通過識別問題中的疑問詞來工作的。這對于問題集中的這些句法簡單的問題能達(dá)到良好的識別效果。 (5)對于問題與查詢文檔中存在的詞不匹配的情況，采用了基于維基百科的查詢擴(kuò)展方法，包括維基頁面的查找、相關(guān)段落的定位和擴(kuò)展詞的選取。通過實(shí)驗(yàn)對比證明此方法能夠有效地提高檢索結(jié)果的質(zhì)量。 (6)為了進(jìn)一步提高檢索結(jié)果的準(zhǔn)確率，系統(tǒng)還在文檔重排模塊使用BM25算法對檢索結(jié)果進(jìn)行重排，，重排后得到最終的檢索結(jié)果。
[Abstract]:Q & A system is a new generation of intelligent search engine, it allows users to ask questions in natural language, and can return accurate answers to users. Therefore, compared with the traditional search engine, the Q & A system can better meet the query requirements of users and more accurately retrieve the answers that users need. Based on the work done in NTCIR8, this paper studies the two main parts of the Chinese question answering system, namely, IR4QA, and finally implements the IR4QA system. The problem understanding stage is the research content of all the natural language interface systems, which is the first stage of the question answering system. The analysis results of this stage have a great influence on the processing of the later several stages. The information retrieval stage is in the middle of the execution stage in the question and answer system, and its analysis results will greatly affect the quality of the results of the subsequent modules. In this paper, by comparing and analyzing the problems existing in the two stages of the general question answering system, we find out more effective methods to be applied in our system. On the basis of previous studies, this paper has done some work as follows: (1) sorting out and analyzing the research status of automatic question answering system and search engine technology at home and abroad, combining the advantages of the two systems, In this paper, the author puts forward a method of applying Wikipedia to the automatic question answering system, that is, IR4QA system based on Wikipedia, for some problems, such as miscellaneous search results, long time consuming, low accuracy of results and so on, which appear when users use search engines. The system is designed and implemented. (2) by analyzing the effect of the system, a series of feasible methods have been developed in the early stage of the system design. Based on these methods, the design principles of the system are determined by adopting the idea of layering and modularization, and the system is divided into three modules: index generation module, problem analysis module, query expansion module, and so on. Document retrieval module and document rearrangement module. (3) some key technologies involved in the system are studied, and the difficulties encountered in the process of implementation are accumulated in theory and technology. And put forward practical solutions. (4) in the process of problem classification, considering the characteristics of problem focus and taking into account the huge task of Chinese grammar and semantic analysis, the quality of the system can be improved. The system does not adopt the problem classification method which is generally used in the English question answering system, but uses heuristic rules to identify the question words in the question. These simple syntactic problems in the problem set can achieve a good recognition effect. (5) for the case where the question does not match the words in the query document, the method of query expansion based on Wikipedia is used. Including wiki page search, the location of relevant paragraphs and the selection of extension words. Experimental results show that this method can effectively improve the quality of retrieval results. (6) in order to further improve the accuracy of retrieval results, the system also uses BM25 algorithm to rearrange the retrieval results in the document rearrangement module. The final retrieval results are obtained after the rearrangement.
【學(xué)位授予單位】：武漢科技大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2012
【分類號】：TP391.3

【參考文獻(xiàn)】

相關(guān)期刊論文前9條

1 黃德根,朱和合,王昆侖,楊元生,鐘萬勰;基于最長次長匹配的漢語自動分詞[J];大連理工大學(xué)學(xué)報(bào);1999年06期

2 李振星,徐澤平,唐衛(wèi)清,唐榮錫;全二分最大匹配快速分詞算法[J];計(jì)算機(jī)工程與應(yīng)用;2002年11期

3 王樹西;問答系統(tǒng):核心技術(shù)、發(fā)展趨勢[J];計(jì)算機(jī)工程與應(yīng)用;2005年18期

4 王雙成;林士敏;陸玉昌;;貝葉斯網(wǎng)絡(luò)結(jié)構(gòu)學(xué)習(xí)分析[J];計(jì)算機(jī)科學(xué);2000年10期

5 孫茂松,肖明,鄒嘉彥;基于無指導(dǎo)學(xué)習(xí)策略的無詞表?xiàng)l件下的漢語自動分詞[J];計(jì)算機(jī)學(xué)報(bào);2004年06期

6 韓客松,王永成,陳桂林;漢語語言的無詞典分詞模型系統(tǒng)[J];計(jì)算機(jī)應(yīng)用研究;1999年10期

7 吳友政,趙軍,段湘煜,徐波;問答式檢索技術(shù)及評測研究綜述[J];中文信息學(xué)報(bào);2005年03期

8 丁國棟;白碩;王斌;;一種基于局部共現(xiàn)的查詢擴(kuò)展方法[J];中文信息學(xué)報(bào);2006年03期

9 牛耘,朱獻(xiàn)有;神經(jīng)網(wǎng)絡(luò)技術(shù)在漢語歧義切分中的應(yīng)用[J];情報(bào)學(xué)報(bào);1999年03期

本文編號：2079950

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2079950.html

上一篇：基于MapReduce的分布式搜索引擎研究
下一篇：面向用戶的智能信息搜索系統(tǒng)的設(shè)計(jì)

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

維基百科在IR4QA系統(tǒng)中的應(yīng)用研究