天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

面向問(wèn)答社區(qū)的問(wèn)題類型敏感的答案摘要算法研究

發(fā)布時(shí)間:2018-10-12 21:01
【摘要】:隨著Web交互技術(shù)的快速發(fā)展,百度知道、Yahoo! Answer等問(wèn)答社區(qū)中積累了大量的問(wèn)答資源,為開(kāi)放型問(wèn)題的解決提供了新的途徑。而問(wèn)答社區(qū)中的問(wèn)答對(duì)的可信度低、噪聲多等質(zhì)量問(wèn)題給問(wèn)答資源的再利用帶來(lái)了很大的困難。如何從問(wèn)答社區(qū)中挖掘高質(zhì)量的資源是近年來(lái)網(wǎng)絡(luò)問(wèn)答社區(qū)的一個(gè)重要研究任務(wù)。很多研究通過(guò)答案質(zhì)量評(píng)價(jià)方式獲取高質(zhì)量的問(wèn)答對(duì),,很少考慮開(kāi)放型問(wèn)題的單一答案不完整問(wèn)題。本文從答案集合出發(fā),以獲取完整的、高質(zhì)量的答案為最終目的,針對(duì)不同類型的問(wèn)題的答案摘要算法進(jìn)行研究。本文研究?jī)?nèi)容主要有如下3個(gè)方面: (1)為了針對(duì)不同類型的問(wèn)題設(shè)計(jì)更有效的答案摘要算法,本文對(duì)于問(wèn)答社區(qū)的問(wèn)題分類進(jìn)行了研究。首先,提出了一個(gè)面向問(wèn)答社區(qū)的兩層問(wèn)題分類體系。其次,分析了事實(shí)型問(wèn)題和問(wèn)答社區(qū)問(wèn)題的區(qū)別,在特征提取中除了事實(shí)型問(wèn)題的詞法特征和語(yǔ)法特征本文還引入問(wèn)題的社區(qū)特征,并且分析了各種特征的分布特點(diǎn)。最后,通過(guò)增量式特征組合選取最佳特征組合,引入兩階段主動(dòng)學(xué)習(xí)策略充分利用未標(biāo)注樣本提升問(wèn)題分類效果。 (2)本文引入主題詞表示答案,將傳統(tǒng)的主題詞抽取方法移植到答案的主題抽取中,通過(guò)使用主題詞更好地表達(dá)答案的語(yǔ)義信息;針對(duì)答案的覆蓋度、與問(wèn)題的相關(guān)度以及內(nèi)容質(zhì)量等評(píng)價(jià)指標(biāo)設(shè)計(jì)了量化方法,并且使用這些量化指標(biāo)監(jiān)督答案摘要過(guò)程;分析了各種類型問(wèn)題的答案特點(diǎn),結(jié)合答案的覆蓋度、與問(wèn)題相關(guān)度和內(nèi)容質(zhì)量設(shè)計(jì)了句子的打分函數(shù),在此基礎(chǔ)上分別提出了咨詢型、觀點(diǎn)型和調(diào)研型三類問(wèn)題的答案摘要算法。實(shí)驗(yàn)結(jié)果表明本文提出的答案摘要算法對(duì)于摘要質(zhì)量提升較大。 (3)本文實(shí)現(xiàn)了一個(gè)問(wèn)題類型敏感的答案摘要系統(tǒng),將前述的問(wèn)答社區(qū)問(wèn)題分類方法和答案摘要方法通過(guò)該系統(tǒng)進(jìn)行融合,人們通過(guò)在該系統(tǒng)中檢索問(wèn)題就可以獲取對(duì)應(yīng)問(wèn)題的答案摘要,大大提高了用戶獲取信息的效率。同時(shí)該系統(tǒng)對(duì)摘要的呈現(xiàn)方式也進(jìn)行了改進(jìn),調(diào)研類問(wèn)題以圖表形式呈現(xiàn)給用戶,觀點(diǎn)型問(wèn)題按照答案的情感極性分條顯示,這都增強(qiáng)了答案摘要的可讀性。
[Abstract]:With the rapid development of Web interactive technology, Baidu knows, Yahoo! A large number of Q & A resources have been accumulated in the Q & A community such as Answer, which provides a new way to solve the open problem. However, the reliability of Q & A in the Q & A community is low, and the quality problems, such as high noise, bring great difficulties to the reuse of Q & A resources. How to excavate high quality resources from Q & A community is an important task of online Q & A community in recent years. Many studies obtain high-quality question-answer pairs by evaluating the quality of answers, and rarely consider a single incomplete answer to an open question. Starting from the answer set and taking the complete and high quality answer as the ultimate goal, this paper studies the algorithms for different types of questions. The main contents of this paper are as follows: (1) in order to design a more effective answer summary algorithm for different types of questions, this paper studies the question classification of question answering community. Firstly, a two-level problem classification system for Q & A community is proposed. In addition to the lexical and grammatical features of the factual questions, this paper also introduces the community features of the problem, and analyzes the distribution of the various features. Finally, the best feature combination is selected by incremental feature combination, and the two-stage active learning strategy is introduced to make full use of unlabeled samples to improve the classification effect of the problem. (2) in this paper, the theme words are introduced to represent the answer. The traditional method of subject word extraction is transplanted to the topic extraction of the answer, and the semantic information of the answer is better expressed by using the theme word. This paper designs a quantitative method for evaluating indexes such as the relevance of the question and the quality of the content, and uses these quantitative indicators to supervise the summary process of the answer, analyzes the characteristics of the answers to various types of questions, and combines the coverage of the answers. The sentence scoring function is designed with the relevance of the question and the quality of the content. On this basis, three kinds of answer summarization algorithms are proposed, which are the consultation type, the viewpoint type and the research type respectively. The experimental results show that the algorithm proposed in this paper can improve the quality of the abstract. (3) A problem type sensitive answer summary system is implemented in this paper. Through the fusion of the above methods of question and answer classification and answer summary, people can get the answer summary by retrieving the questions in the system, which greatly improves the efficiency of the user to obtain the information. At the same time, the system also improves the presentation of abstracts, and the survey questions are presented to users in the form of charts. Opinion questions are displayed according to the emotional polarity of the answers, which enhances the readability of the abstracts.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類號(hào)】:TP393.09

【參考文獻(xiàn)】

相關(guān)期刊論文 前1條

1 金鋒;黃民烈;朱小燕;;Guided Structure-Aware Review Summarization[J];Journal of Computer Science & Technology;2011年04期

相關(guān)博士學(xué)位論文 前1條

1 王寶勛;面向網(wǎng)絡(luò)社區(qū)問(wèn)答對(duì)的語(yǔ)義挖掘研究[D];哈爾濱工業(yè)大學(xué);2013年



本文編號(hào):2267580

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2267580.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶4cb20***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com