天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當前位置:主頁 > 科技論文 > 搜索引擎論文 >

基于互聯(lián)網(wǎng)的自動問答答案抽取的研究

發(fā)布時間:2018-01-28 22:42

  本文關(guān)鍵詞: 自動問答 答案抽取 圖模型 排序?qū)W習(xí) 詞表示 復(fù)述 出處:《天津大學(xué)》2014年博士論文 論文類型:學(xué)位論文


【摘要】:基于互聯(lián)網(wǎng)的自動問答基于搜索引擎返回的結(jié)果回答自然語言問題,可充分利用搜索引擎高質(zhì)量的結(jié)果,省去存儲大量文檔的必要。答案抽取是從檢索得到的文本中生成答案,包含候選生成和候選排序。由于搜索片段具有噪音多、句子結(jié)構(gòu)不完整等特點,使得基于搜索結(jié)果的答案抽取和正規(guī)文本上的答案抽取有很大不同,傳統(tǒng)方法在該任務(wù)上受到影響,性能下降。本博士論文討論如何針對搜索結(jié)果的問題優(yōu)化答案抽取,包括以下課題:針對一些搜索結(jié)果中正確答案出現(xiàn)的特征不明顯的問題,本文提出了基于段落圖模型的候選生成方法,某個段落中的候選生成可以接收到來自其他段落中的信息、并幫助提高當前段落中生成候選的結(jié)果。實驗證明,該模型可有效提高候選生成的準確率和召回率。對搜索結(jié)果中噪音多、句法結(jié)構(gòu)不完整的問題,本文提出了剪枝排序融合整合不同候選生成方法,并基于排序?qū)W習(xí)進行候選重排序。該框架可以有效減輕搜索結(jié)果中的噪音的影響。實驗證明,本文中的排序方法在基于搜索結(jié)果中的候選排序任務(wù)上超過了目前最好的算法。針對搜索結(jié)果表達和原問題之間有較大差異、在計算相似度時可擴展性差的問題,本文提出了兩種基于詞表示的問題和候選答案相似度的計算方法,包括搜索結(jié)果和問題之間的文本相似度和候選答案和答案類型之間的語義相似度。實驗證明,使用本文提出的兩種基于詞表示計算的相似度可以有效提高候選排序的結(jié)果。針對搜索結(jié)果和問題間存在表述差異這一問題,本文探討復(fù)述生成的應(yīng)用。本文提出了基于聯(lián)合學(xué)習(xí)的對偶機器翻譯系統(tǒng)生成復(fù)述的方法以及復(fù)述生成的評價指標。使用該方法生成問題的復(fù)述表示,可增加復(fù)述表示的差異性,減輕計算相似度時不同表示之間差異帶來的影響。實驗證明,使用本文提出的復(fù)述生成方法可提高候選排序結(jié)果。其中,本文使用基于段落圖模型方法進行候選生成,然后結(jié)合其他候選生成方法、基于排序?qū)W習(xí)進行候選排序。在此基礎(chǔ)上,使用基于詞向量、復(fù)述計算的相似度特征提高排序結(jié)果。通過本文的研究,減輕了基于搜索結(jié)果生成答案時,搜索片段的噪音等問題對問答結(jié)果的影響,使得基于互聯(lián)網(wǎng)的自動問答的答案抽取在不依賴句法、語義相似度的情況下,獲得超過目前最好答案抽取方法的結(jié)果。
[Abstract]:Internet-based automatic Q & A based on the results returned by search engines to answer natural language questions, can make full use of high quality search engine results. The answer extraction is to generate the answer from the retrieved text, including candidate generation and candidate sorting. Because the search segment is noisy, sentence structure is incomplete and so on. As a result, the search results based answer extraction and the formal text answer extraction are very different, the traditional method is affected on the task. Performance degradation. This Ph. D. thesis discusses how to optimize the answer extraction for search results, including the following topics: for some of the search results the correct answers appear in the characteristics of the problem is not obvious. This paper proposes a candidate generation method based on paragraph graph model. Candidate generation in one paragraph can receive information from other paragraphs and help improve the result of candidate generation in current paragraph. This model can effectively improve the accuracy and recall rate of candidate generation. For the problems of noisy search results and incomplete syntactic structure, this paper proposes different candidate generation methods of pruning sorting fusion integration. And based on sorting learning candidate reordering. This framework can effectively reduce the impact of the noise in the search results. Experimental results show that. The sorting method in this paper is superior to the best algorithm in candidate sorting tasks based on search results. There are great differences between the expression of search results and the original problem. In this paper, we propose two methods based on word representation and candidate answer similarity. Including the text similarity between search results and questions and the semantic similarity between candidate answers and answer types. Using the two kinds of similarity based on word representation in this paper, we can improve the result of candidate ranking effectively. In order to solve the problem of the difference between the search results and the problem, we can solve the problem of the difference between the search results and the problem. This paper discusses the application of repetition generation. In this paper, a method of generating retelling in dual machine translation system based on joint learning and its evaluation index are proposed. The method is used to generate the restatement representation of the problem. It can increase the difference of repeat representation and reduce the influence of different representations when calculating similarity. The experiment proves that the method proposed in this paper can improve the result of candidate ranking. This paper uses the method of paragraph graph model for candidate generation, then combines other candidate generation methods, based on sort learning to carry out candidate sorting. On this basis, we use word vector. The similarity features of the retelling computation improve the ranking results. Through the research in this paper, the effects of the noise of the search segments on the results of question and answer are alleviated when the answers are generated from the search results. It makes the automatic question and answer extraction based on the Internet obtain more results than the best method of answer extraction without syntactic and semantic similarity.
【學(xué)位授予單位】:天津大學(xué)
【學(xué)位級別】:博士
【學(xué)位授予年份】:2014
【分類號】:TP391.1
,

本文編號:1471749

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1471749.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶cb6e5***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com
大香蕉再在线大香蕉再在线| 91偷拍裸体一区二区三区| 国产熟女一区二区精品视频| 操白丝女孩在线观看免费高清| 91偷拍视频久久精品| 欧美日韩精品久久第一页| 欧美日韩有码一二三区| 国产欧洲亚洲日产一区二区| 午夜午夜精品一区二区| 亚洲一级在线免费观看| 最新国产欧美精品91| 成人午夜爽爽爽免费视频| 亚洲欧美一二区日韩高清在线| 国产一区二区三区精品免费| 国产欧美日韩精品成人专区| 欧美日韩在线视频一区| 亚洲国产另类久久精品| 久久国产成人精品国产成人亚洲| 亚洲国产成人精品一区刚刚| 五月婷婷缴情七月丁香| 国产丝袜女优一区二区三区| 日韩一区二区三区有码| 国产男女激情在线视频| 国产精品欧美激情在线观看| 国产激情一区二区三区不卡| 日韩日韩日韩日韩在线| 五月综合激情婷婷丁香| 国产又粗又猛又爽色噜噜 | 麻豆欧美精品国产综合久久| 国产黑人一区二区三区| 国产精品尹人香蕉综合网| 欧美日韩国产精品黄片| 国产三级视频不卡在线观看| 欧美日韩国产精品第五页| 人人妻在人人看人人澡| 国产欧美日韩精品成人专区| 国产一区二区三区色噜噜| 国产激情一区二区三区不卡| 欧美人与动牲交a精品| 夫妻性生活动态图视频| 免费大片黄在线观看国语|