基于互聯(lián)網(wǎng)的自動(dòng)問答答案抽取的研究

發(fā)布時(shí)間：2018-01-28 22:42

本文關(guān)鍵詞： 自動(dòng)問答答案抽取圖模型排序?qū)W習(xí) 詞表示復(fù)述　出處：《天津大學(xué)》2014年博士論文　論文類型：學(xué)位論文

【摘要】：基于互聯(lián)網(wǎng)的自動(dòng)問答基于搜索引擎返回的結(jié)果回答自然語(yǔ)言問題,可充分利用搜索引擎高質(zhì)量的結(jié)果,省去存儲(chǔ)大量文檔的必要。答案抽取是從檢索得到的文本中生成答案,包含候選生成和候選排序。由于搜索片段具有噪音多、句子結(jié)構(gòu)不完整等特點(diǎn),使得基于搜索結(jié)果的答案抽取和正規(guī)文本上的答案抽取有很大不同,傳統(tǒng)方法在該任務(wù)上受到影響,性能下降。本博士論文討論如何針對(duì)搜索結(jié)果的問題優(yōu)化答案抽取,包括以下課題:針對(duì)一些搜索結(jié)果中正確答案出現(xiàn)的特征不明顯的問題,本文提出了基于段落圖模型的候選生成方法,某個(gè)段落中的候選生成可以接收到來自其他段落中的信息、并幫助提高當(dāng)前段落中生成候選的結(jié)果。實(shí)驗(yàn)證明,該模型可有效提高候選生成的準(zhǔn)確率和召回率。對(duì)搜索結(jié)果中噪音多、句法結(jié)構(gòu)不完整的問題,本文提出了剪枝排序融合整合不同候選生成方法,并基于排序?qū)W習(xí)進(jìn)行候選重排序。該框架可以有效減輕搜索結(jié)果中的噪音的影響。實(shí)驗(yàn)證明,本文中的排序方法在基于搜索結(jié)果中的候選排序任務(wù)上超過了目前最好的算法。針對(duì)搜索結(jié)果表達(dá)和原問題之間有較大差異、在計(jì)算相似度時(shí)可擴(kuò)展性差的問題,本文提出了兩種基于詞表示的問題和候選答案相似度的計(jì)算方法,包括搜索結(jié)果和問題之間的文本相似度和候選答案和答案類型之間的語(yǔ)義相似度。實(shí)驗(yàn)證明,使用本文提出的兩種基于詞表示計(jì)算的相似度可以有效提高候選排序的結(jié)果。針對(duì)搜索結(jié)果和問題間存在表述差異這一問題,本文探討復(fù)述生成的應(yīng)用。本文提出了基于聯(lián)合學(xué)習(xí)的對(duì)偶機(jī)器翻譯系統(tǒng)生成復(fù)述的方法以及復(fù)述生成的評(píng)價(jià)指標(biāo)。使用該方法生成問題的復(fù)述表示,可增加復(fù)述表示的差異性,減輕計(jì)算相似度時(shí)不同表示之間差異帶來的影響。實(shí)驗(yàn)證明,使用本文提出的復(fù)述生成方法可提高候選排序結(jié)果。其中,本文使用基于段落圖模型方法進(jìn)行候選生成,然后結(jié)合其他候選生成方法、基于排序?qū)W習(xí)進(jìn)行候選排序。在此基礎(chǔ)上,使用基于詞向量、復(fù)述計(jì)算的相似度特征提高排序結(jié)果。通過本文的研究,減輕了基于搜索結(jié)果生成答案時(shí),搜索片段的噪音等問題對(duì)問答結(jié)果的影響,使得基于互聯(lián)網(wǎng)的自動(dòng)問答的答案抽取在不依賴句法、語(yǔ)義相似度的情況下,獲得超過目前最好答案抽取方法的結(jié)果。
[Abstract]:Internet-based automatic Q & A based on the results returned by search engines to answer natural language questions, can make full use of high quality search engine results. The answer extraction is to generate the answer from the retrieved text, including candidate generation and candidate sorting. Because the search segment is noisy, sentence structure is incomplete and so on. As a result, the search results based answer extraction and the formal text answer extraction are very different, the traditional method is affected on the task. Performance degradation. This Ph. D. thesis discusses how to optimize the answer extraction for search results, including the following topics: for some of the search results the correct answers appear in the characteristics of the problem is not obvious. This paper proposes a candidate generation method based on paragraph graph model. Candidate generation in one paragraph can receive information from other paragraphs and help improve the result of candidate generation in current paragraph. This model can effectively improve the accuracy and recall rate of candidate generation. For the problems of noisy search results and incomplete syntactic structure, this paper proposes different candidate generation methods of pruning sorting fusion integration. And based on sorting learning candidate reordering. This framework can effectively reduce the impact of the noise in the search results. Experimental results show that. The sorting method in this paper is superior to the best algorithm in candidate sorting tasks based on search results. There are great differences between the expression of search results and the original problem. In this paper, we propose two methods based on word representation and candidate answer similarity. Including the text similarity between search results and questions and the semantic similarity between candidate answers and answer types. Using the two kinds of similarity based on word representation in this paper, we can improve the result of candidate ranking effectively. In order to solve the problem of the difference between the search results and the problem, we can solve the problem of the difference between the search results and the problem. This paper discusses the application of repetition generation. In this paper, a method of generating retelling in dual machine translation system based on joint learning and its evaluation index are proposed. The method is used to generate the restatement representation of the problem. It can increase the difference of repeat representation and reduce the influence of different representations when calculating similarity. The experiment proves that the method proposed in this paper can improve the result of candidate ranking. This paper uses the method of paragraph graph model for candidate generation, then combines other candidate generation methods, based on sort learning to carry out candidate sorting. On this basis, we use word vector. The similarity features of the retelling computation improve the ranking results. Through the research in this paper, the effects of the noise of the search segments on the results of question and answer are alleviated when the answers are generated from the search results. It makes the automatic question and answer extraction based on the Internet obtain more results than the best method of answer extraction without syntactic and semantic similarity.
【學(xué)位授予單位】：天津大學(xué)
【學(xué)位級(jí)別】：博士
【學(xué)位授予年份】：2014
【分類號(hào)】：TP391.1
，

本文編號(hào)：1471749

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1471749.html

上一篇：基于數(shù)據(jù)挖掘的主題種子站點(diǎn)提取器的研究
下一篇：GnRH拮抗劑方案及GnRH-a短方案用于卵巢低反應(yīng)患者助孕臨床療效Meta分析

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于互聯(lián)網(wǎng)的自動(dòng)問答答案抽取的研究