天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當前位置:主頁 > 科技論文 > 搜索引擎論文 >

非事實類問題的答案選取

發(fā)布時間:2018-08-07 13:50
【摘要】:隨著問答社區(qū)網(wǎng)站的興起,越來越多的用戶生成數(shù)據(jù)積累了起來。這些用戶生成數(shù)據(jù)不僅具有海量的、多樣性的等特點,還有著極高的質(zhì)量和重用價值。為了高效的管理和利用這些數(shù)據(jù),近年來研究人員基于這些數(shù)據(jù)進行了大量的研究和實踐,而社區(qū)問答就是一個被廣泛研究的課題。 社區(qū)問答研究基于問答社區(qū)數(shù)據(jù),與傳統(tǒng)的問答系統(tǒng)有著明顯的不同。傳統(tǒng)問答系統(tǒng)主要解決以短語和命名實體為答案的事實類問題,主要模塊是問題理解和答案抽取。而社區(qū)問答則沒有這樣的限制,并且其特別適合回答詢問建議或觀點的非事實類問題。社區(qū)問答研究涵蓋問題檢索與推薦、問題的興趣度、問題和答案的質(zhì)量、答案的排序、用戶權(quán)威性等研究方向。其中問題檢索和答案的選取作為社區(qū)問答的核心模塊更是受到了學(xué)術(shù)界和工業(yè)界的廣泛關(guān)注。 本課題主要工作為構(gòu)建一個基于大規(guī)模問答社區(qū)數(shù)據(jù)的社區(qū)問答系統(tǒng),并對其中涉及的問題分析、問題檢索和答案選取技術(shù)進行了深入的研究。 社區(qū)問答系統(tǒng)構(gòu)建過程中,本課題收集了來自Yahoo! Answers等社區(qū)網(wǎng)站的超過1.3億問題和10億答案的大規(guī)模數(shù)據(jù),和之前的基于百萬量級的數(shù)據(jù)的問答社區(qū)相關(guān)研究工作相比有著明顯的不同和極高的實用價值。在此數(shù)據(jù)的基礎(chǔ)上,,本課題通過查詢自動分類方法來提高每次查詢效率和效果。 在問題檢索過程中,本課題提出了應(yīng)用查詢問句和候選問題的結(jié)構(gòu)信息和語義信息,并結(jié)合排序?qū)W習(xí)算法來融合多種不同類別的特征。通過訓(xùn)練數(shù)據(jù)生成排序模型來提高問題檢索的相關(guān)性和詞語不匹配等問題。實驗表明,本課題應(yīng)用Ranking SVM方法來訓(xùn)練的排序模型在不同數(shù)據(jù)集上,其準確率等評價指標上都相比以往的方法有著顯著的提高。 在通過問題檢索找到與查詢問句語義相似的候選問題后,本課題還提出了一個基于問答對的內(nèi)容信息的新的無監(jiān)督學(xué)習(xí)方法,來判定答案的質(zhì)量以過濾低質(zhì)量的答案。本課題對問答社區(qū)中的數(shù)據(jù)有以下三個假設(shè):1、一個問題下的大部分答案都是正常的,只有少部分答案是低質(zhì)量的需要被過濾掉;2、低質(zhì)量答案可以通過對比同一問題下的其他答案而被檢測出來;3、不同的答案應(yīng)該有不同的判定答案質(zhì)量高低的標準;谝陨霞僭O(shè),本課題應(yīng)用基于內(nèi)容的特征,通過最小化答案特征向量的方差,同時盡可能多的保留答案的方式來對檢測低質(zhì)量答案。實驗表明,該方法相比于基準方法在ROC數(shù)值上有著明顯的提高。 在低質(zhì)量答案過濾之后,本課題還應(yīng)用問答對的文本信息和社區(qū)網(wǎng)站回答者的權(quán)威性信息,通過問答社區(qū)中的用戶選出的最佳答案數(shù)據(jù)和Ranking SVM算法訓(xùn)練了一個答案排序模型,來對答案進行重新排序選取最佳的答案。通過以上幾個步驟,本課題構(gòu)建了一個高效、實用的社區(qū)問答系統(tǒng),通過300個商業(yè)搜索引擎查詢?nèi)罩局懈哳l問題的測試,有78.0%的問題可以給出正確的答案,并對于任意問句可在2秒中內(nèi)給出結(jié)果,該社區(qū)問答系統(tǒng)具有很好效果與實用性。
[Abstract]:With the rise of the question and answer community, more and more user generated data have been accumulated. These users generate data not only with mass, diversity, but also of high quality and reuse. In order to manage and use these data efficiently, researchers have done a lot of research on these data in recent years. And practice, and community Q & A is a widely studied subject.
The community question and answer study is based on the question and answer community data, which is obviously different from the traditional question answering system. The traditional question answering system mainly solves the fact class problem with the answer of the phrase and the named entity. The main module is the problem understanding and the answer extraction. The community question answer is not limited, and it is especially suitable for answering questions and ideas. The community question and answer research covers the search and recommendation of the problem, the degree of interest, the quality of the questions and answers, the order of the answers, the authority of the user and so on. The key module of the question and answer of the question is the attention of the academia and the industry.
The main work of this project is to build a community Q & a system based on the mass question and answer community data, and make an in-depth study of the problems involved in the problem analysis, the problem retrieval and the answer selection technology.
In the process of community Q & a system construction, this subject has collected more than 130 million questions and 1 billion answers from the community websites of Yahoo! Answers and so on. It has significant difference and high practical value compared with the previous question and answer community related research based on millions of data. On the basis of this data It improves the efficiency and effectiveness of each query by querying automatic classification.
In the process of problem retrieval, this topic puts forward the structure and semantic information of query questions and candidate questions, and combines the sorting learning algorithm to merge the characteristics of various different categories. Through training data generating sorting model to improve the correlation of problem retrieval and the mismatch of words, the experiment shows that this topic is applied to Ran The ranking model trained by King SVM has a remarkable improvement in accuracy and other evaluation indexes compared with the previous methods on different data sets.
A new unsupervised learning method based on QA based content information is proposed to find the quality of answers to filter low quality answers. This subject has three hypotheses in the question and answer community: 1, a large part under a problem. Only a few answers are normal, only a few answers are low quality needs to be filtered out; 2, low quality answers can be detected by comparing other answers to the same problem; 3, different answers should have different criteria for determining the quality of the answers. Based on the above hypothesis, the subject applies the features based on content, through the above hypothesis. The variance of the answer eigenvectors is minimized and the answers are kept as many as possible to detect low quality answers. Experiments show that the method has a significant increase in the ROC value compared to the benchmark method.
After the low quality answer filtering, the subject also uses the text information of the question answer pair and the authoritative information of the responders of the community website, and trains an answer sorting model through the best answer data selected by the user in the question and answer community and the Ranking SVM algorithm, to select the best answer to the answer by a new sort. Step, this project constructs an efficient and practical community Q & a system, and through 300 commercial search engines to test the high frequency problem in the log, 78% of the questions can give the correct answer, and the question can be given the result in 2 seconds. The community question answering system has good effect and practicability.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:TP391.1

【相似文獻】

相關(guān)期刊論文 前10條

1 賈君枝;毛海飛;;漢語框架網(wǎng)絡(luò)問答系統(tǒng)問句處理研究[J];圖書情報工作;2008年10期

2 王君;李舟軍;胡俠;胡必云;;一種新的復(fù)合核函數(shù)及在問句檢索中的應(yīng)用[J];電子與信息學(xué)報;2011年01期

3 黨琰,張冬茉,李芳;角色反演算法在問答系統(tǒng)中的應(yīng)用[J];計算機工程與應(yīng)用;2004年36期

4 張曉孿;王西鋒;;中文問答系統(tǒng)中語義角色標注的研究與實現(xiàn)[J];科學(xué)技術(shù)與工程;2008年10期

5 秦兵,劉挺,王洋,鄭實福,李生;基于常問問題集的中文問答系統(tǒng)研究[J];哈爾濱工業(yè)大學(xué)學(xué)報;2003年10期

6 付鴻鵠;基于W eb的開放領(lǐng)域問答系統(tǒng)研究[J];現(xiàn)代圖書情報技術(shù);2005年09期

7 高明霞;劉椿年;;基于模糊描述邏輯的PNL網(wǎng)絡(luò)問答系統(tǒng)[J];計算機工程;2006年21期

8 王樹西;趙星秋;潘碩;;問答系統(tǒng)在教學(xué)中的應(yīng)用[J];中國教育信息化;2007年07期

9 杜瑋;邸書靈;孫樹靜;;基于互聯(lián)網(wǎng)技術(shù)的問答系統(tǒng)研究[J];微計算機信息;2007年36期

10 陳敏杰;;問答系統(tǒng)中問題分析模塊的實現(xiàn)[J];經(jīng)營管理者;2009年13期

相關(guān)會議論文 前10條

1 何靖;陳

本文編號:2170221


資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2170221.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶5fbb7***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com
欧美偷拍一区二区三区四区| 女同伦理国产精品久久久| 一区二区三区四区亚洲专区| 日本加勒比在线播放一区| 国产情侣激情在线对白| 91欧美日韩精品在线| 99久久精品一区二区国产| 欧美一区二区三区在线播放| 深夜福利欲求不满的人妻| 91福利视频日本免费看看 | 亚洲精选91福利在线观看| 欧美乱码精品一区二区三| 国产高清视频一区不卡| 一级片黄色一区二区三区| 欧美成人免费夜夜黄啪啪| 亚洲中文字幕在线视频频道| 91精品视频全国免费| 国产精品激情对白一区二区| 精品熟女少妇av免费久久野外| 黄色美女日本的美女日人| 日韩精品福利在线观看| 欧美日韩精品一区二区三区不卡| 国产日韩中文视频一区| 99久久无色码中文字幕免费| 亚洲性生活一区二区三区| 欧美激情床戏一区二区三| 亚洲日本中文字幕视频在线观看| 亚洲另类欧美综合日韩精品| 精品久久综合日本欧美| 青草草在线视频免费视频| 成人精品欧美一级乱黄| 天堂网中文字幕在线视频| 国产美女精品人人做人人爽| 精品推荐国产麻豆剧传媒| 超薄丝袜足一区二区三区| 久久亚洲精品成人国产| 色狠狠一区二区三区香蕉蜜桃 | 国产免费无遮挡精品视频| 欧美精品一区二区三区白虎| 九九久久精品久久久精品| 亚洲国产精品一区二区|