面向問答社區(qū)的問題類型敏感的答案摘要算法研究
發(fā)布時間:2018-10-12 21:01
【摘要】:隨著Web交互技術的快速發(fā)展,百度知道、Yahoo! Answer等問答社區(qū)中積累了大量的問答資源,為開放型問題的解決提供了新的途徑。而問答社區(qū)中的問答對的可信度低、噪聲多等質量問題給問答資源的再利用帶來了很大的困難。如何從問答社區(qū)中挖掘高質量的資源是近年來網(wǎng)絡問答社區(qū)的一個重要研究任務。很多研究通過答案質量評價方式獲取高質量的問答對,,很少考慮開放型問題的單一答案不完整問題。本文從答案集合出發(fā),以獲取完整的、高質量的答案為最終目的,針對不同類型的問題的答案摘要算法進行研究。本文研究內容主要有如下3個方面: (1)為了針對不同類型的問題設計更有效的答案摘要算法,本文對于問答社區(qū)的問題分類進行了研究。首先,提出了一個面向問答社區(qū)的兩層問題分類體系。其次,分析了事實型問題和問答社區(qū)問題的區(qū)別,在特征提取中除了事實型問題的詞法特征和語法特征本文還引入問題的社區(qū)特征,并且分析了各種特征的分布特點。最后,通過增量式特征組合選取最佳特征組合,引入兩階段主動學習策略充分利用未標注樣本提升問題分類效果。 (2)本文引入主題詞表示答案,將傳統(tǒng)的主題詞抽取方法移植到答案的主題抽取中,通過使用主題詞更好地表達答案的語義信息;針對答案的覆蓋度、與問題的相關度以及內容質量等評價指標設計了量化方法,并且使用這些量化指標監(jiān)督答案摘要過程;分析了各種類型問題的答案特點,結合答案的覆蓋度、與問題相關度和內容質量設計了句子的打分函數(shù),在此基礎上分別提出了咨詢型、觀點型和調研型三類問題的答案摘要算法。實驗結果表明本文提出的答案摘要算法對于摘要質量提升較大。 (3)本文實現(xiàn)了一個問題類型敏感的答案摘要系統(tǒng),將前述的問答社區(qū)問題分類方法和答案摘要方法通過該系統(tǒng)進行融合,人們通過在該系統(tǒng)中檢索問題就可以獲取對應問題的答案摘要,大大提高了用戶獲取信息的效率。同時該系統(tǒng)對摘要的呈現(xiàn)方式也進行了改進,調研類問題以圖表形式呈現(xiàn)給用戶,觀點型問題按照答案的情感極性分條顯示,這都增強了答案摘要的可讀性。
[Abstract]:With the rapid development of Web interactive technology, Baidu knows, Yahoo! A large number of Q & A resources have been accumulated in the Q & A community such as Answer, which provides a new way to solve the open problem. However, the reliability of Q & A in the Q & A community is low, and the quality problems, such as high noise, bring great difficulties to the reuse of Q & A resources. How to excavate high quality resources from Q & A community is an important task of online Q & A community in recent years. Many studies obtain high-quality question-answer pairs by evaluating the quality of answers, and rarely consider a single incomplete answer to an open question. Starting from the answer set and taking the complete and high quality answer as the ultimate goal, this paper studies the algorithms for different types of questions. The main contents of this paper are as follows: (1) in order to design a more effective answer summary algorithm for different types of questions, this paper studies the question classification of question answering community. Firstly, a two-level problem classification system for Q & A community is proposed. In addition to the lexical and grammatical features of the factual questions, this paper also introduces the community features of the problem, and analyzes the distribution of the various features. Finally, the best feature combination is selected by incremental feature combination, and the two-stage active learning strategy is introduced to make full use of unlabeled samples to improve the classification effect of the problem. (2) in this paper, the theme words are introduced to represent the answer. The traditional method of subject word extraction is transplanted to the topic extraction of the answer, and the semantic information of the answer is better expressed by using the theme word. This paper designs a quantitative method for evaluating indexes such as the relevance of the question and the quality of the content, and uses these quantitative indicators to supervise the summary process of the answer, analyzes the characteristics of the answers to various types of questions, and combines the coverage of the answers. The sentence scoring function is designed with the relevance of the question and the quality of the content. On this basis, three kinds of answer summarization algorithms are proposed, which are the consultation type, the viewpoint type and the research type respectively. The experimental results show that the algorithm proposed in this paper can improve the quality of the abstract. (3) A problem type sensitive answer summary system is implemented in this paper. Through the fusion of the above methods of question and answer classification and answer summary, people can get the answer summary by retrieving the questions in the system, which greatly improves the efficiency of the user to obtain the information. At the same time, the system also improves the presentation of abstracts, and the survey questions are presented to users in the form of charts. Opinion questions are displayed according to the emotional polarity of the answers, which enhances the readability of the abstracts.
【學位授予單位】:哈爾濱工業(yè)大學
【學位級別】:碩士
【學位授予年份】:2014
【分類號】:TP393.09
本文編號:2267580
[Abstract]:With the rapid development of Web interactive technology, Baidu knows, Yahoo! A large number of Q & A resources have been accumulated in the Q & A community such as Answer, which provides a new way to solve the open problem. However, the reliability of Q & A in the Q & A community is low, and the quality problems, such as high noise, bring great difficulties to the reuse of Q & A resources. How to excavate high quality resources from Q & A community is an important task of online Q & A community in recent years. Many studies obtain high-quality question-answer pairs by evaluating the quality of answers, and rarely consider a single incomplete answer to an open question. Starting from the answer set and taking the complete and high quality answer as the ultimate goal, this paper studies the algorithms for different types of questions. The main contents of this paper are as follows: (1) in order to design a more effective answer summary algorithm for different types of questions, this paper studies the question classification of question answering community. Firstly, a two-level problem classification system for Q & A community is proposed. In addition to the lexical and grammatical features of the factual questions, this paper also introduces the community features of the problem, and analyzes the distribution of the various features. Finally, the best feature combination is selected by incremental feature combination, and the two-stage active learning strategy is introduced to make full use of unlabeled samples to improve the classification effect of the problem. (2) in this paper, the theme words are introduced to represent the answer. The traditional method of subject word extraction is transplanted to the topic extraction of the answer, and the semantic information of the answer is better expressed by using the theme word. This paper designs a quantitative method for evaluating indexes such as the relevance of the question and the quality of the content, and uses these quantitative indicators to supervise the summary process of the answer, analyzes the characteristics of the answers to various types of questions, and combines the coverage of the answers. The sentence scoring function is designed with the relevance of the question and the quality of the content. On this basis, three kinds of answer summarization algorithms are proposed, which are the consultation type, the viewpoint type and the research type respectively. The experimental results show that the algorithm proposed in this paper can improve the quality of the abstract. (3) A problem type sensitive answer summary system is implemented in this paper. Through the fusion of the above methods of question and answer classification and answer summary, people can get the answer summary by retrieving the questions in the system, which greatly improves the efficiency of the user to obtain the information. At the same time, the system also improves the presentation of abstracts, and the survey questions are presented to users in the form of charts. Opinion questions are displayed according to the emotional polarity of the answers, which enhances the readability of the abstracts.
【學位授予單位】:哈爾濱工業(yè)大學
【學位級別】:碩士
【學位授予年份】:2014
【分類號】:TP393.09
【參考文獻】
相關期刊論文 前1條
1 金鋒;黃民烈;朱小燕;;Guided Structure-Aware Review Summarization[J];Journal of Computer Science & Technology;2011年04期
相關博士學位論文 前1條
1 王寶勛;面向網(wǎng)絡社區(qū)問答對的語義挖掘研究[D];哈爾濱工業(yè)大學;2013年
本文編號:2267580
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2267580.html
最近更新
教材專著