非事實類問題的答案選取
[Abstract]:With the rise of the question and answer community, more and more user generated data have been accumulated. These users generate data not only with mass, diversity, but also of high quality and reuse. In order to manage and use these data efficiently, researchers have done a lot of research on these data in recent years. And practice, and community Q & A is a widely studied subject.
The community question and answer study is based on the question and answer community data, which is obviously different from the traditional question answering system. The traditional question answering system mainly solves the fact class problem with the answer of the phrase and the named entity. The main module is the problem understanding and the answer extraction. The community question answer is not limited, and it is especially suitable for answering questions and ideas. The community question and answer research covers the search and recommendation of the problem, the degree of interest, the quality of the questions and answers, the order of the answers, the authority of the user and so on. The key module of the question and answer of the question is the attention of the academia and the industry.
The main work of this project is to build a community Q & a system based on the mass question and answer community data, and make an in-depth study of the problems involved in the problem analysis, the problem retrieval and the answer selection technology.
In the process of community Q & a system construction, this subject has collected more than 130 million questions and 1 billion answers from the community websites of Yahoo! Answers and so on. It has significant difference and high practical value compared with the previous question and answer community related research based on millions of data. On the basis of this data It improves the efficiency and effectiveness of each query by querying automatic classification.
In the process of problem retrieval, this topic puts forward the structure and semantic information of query questions and candidate questions, and combines the sorting learning algorithm to merge the characteristics of various different categories. Through training data generating sorting model to improve the correlation of problem retrieval and the mismatch of words, the experiment shows that this topic is applied to Ran The ranking model trained by King SVM has a remarkable improvement in accuracy and other evaluation indexes compared with the previous methods on different data sets.
A new unsupervised learning method based on QA based content information is proposed to find the quality of answers to filter low quality answers. This subject has three hypotheses in the question and answer community: 1, a large part under a problem. Only a few answers are normal, only a few answers are low quality needs to be filtered out; 2, low quality answers can be detected by comparing other answers to the same problem; 3, different answers should have different criteria for determining the quality of the answers. Based on the above hypothesis, the subject applies the features based on content, through the above hypothesis. The variance of the answer eigenvectors is minimized and the answers are kept as many as possible to detect low quality answers. Experiments show that the method has a significant increase in the ROC value compared to the benchmark method.
After the low quality answer filtering, the subject also uses the text information of the question answer pair and the authoritative information of the responders of the community website, and trains an answer sorting model through the best answer data selected by the user in the question and answer community and the Ranking SVM algorithm, to select the best answer to the answer by a new sort. Step, this project constructs an efficient and practical community Q & a system, and through 300 commercial search engines to test the high frequency problem in the log, 78% of the questions can give the correct answer, and the question can be given the result in 2 seconds. The community question answering system has good effect and practicability.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:TP391.1
【相似文獻】
相關(guān)期刊論文 前10條
1 賈君枝;毛海飛;;漢語框架網(wǎng)絡(luò)問答系統(tǒng)問句處理研究[J];圖書情報工作;2008年10期
2 王君;李舟軍;胡俠;胡必云;;一種新的復(fù)合核函數(shù)及在問句檢索中的應(yīng)用[J];電子與信息學(xué)報;2011年01期
3 黨琰,張冬茉,李芳;角色反演算法在問答系統(tǒng)中的應(yīng)用[J];計算機工程與應(yīng)用;2004年36期
4 張曉孿;王西鋒;;中文問答系統(tǒng)中語義角色標注的研究與實現(xiàn)[J];科學(xué)技術(shù)與工程;2008年10期
5 秦兵,劉挺,王洋,鄭實福,李生;基于常問問題集的中文問答系統(tǒng)研究[J];哈爾濱工業(yè)大學(xué)學(xué)報;2003年10期
6 付鴻鵠;基于W eb的開放領(lǐng)域問答系統(tǒng)研究[J];現(xiàn)代圖書情報技術(shù);2005年09期
7 高明霞;劉椿年;;基于模糊描述邏輯的PNL網(wǎng)絡(luò)問答系統(tǒng)[J];計算機工程;2006年21期
8 王樹西;趙星秋;潘碩;;問答系統(tǒng)在教學(xué)中的應(yīng)用[J];中國教育信息化;2007年07期
9 杜瑋;邸書靈;孫樹靜;;基于互聯(lián)網(wǎng)技術(shù)的問答系統(tǒng)研究[J];微計算機信息;2007年36期
10 陳敏杰;;問答系統(tǒng)中問題分析模塊的實現(xiàn)[J];經(jīng)營管理者;2009年13期
相關(guān)會議論文 前10條
1 何靖;陳
本文編號:2170221
本文鏈接:http://sikaile.net/kejilunwen/sousuoyinqinglunwen/2170221.html