天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁(yè) > 碩博論文 > 信息類碩士論文 >

社區(qū)問(wèn)答系統(tǒng)中非事實(shí)性問(wèn)題的答案摘要算法研究

發(fā)布時(shí)間:2017-12-27 12:07

  本文關(guān)鍵詞:社區(qū)問(wèn)答系統(tǒng)中非事實(shí)性問(wèn)題的答案摘要算法研究 出處:《山東大學(xué)》2017年碩士論文 論文類型:學(xué)位論文


  更多相關(guān)文章: 社區(qū)問(wèn)答系統(tǒng) 稀疏編碼 短文本處理 文檔摘要


【摘要】:最近幾年,我們可以看到,社區(qū)問(wèn)答系統(tǒng)中的用戶數(shù)量正呈現(xiàn)出高速增長(zhǎng)的態(tài)勢(shì)。社區(qū)問(wèn)答系統(tǒng)給用戶提供了一個(gè)發(fā)布問(wèn)題以及尋找答案的平臺(tái),而這個(gè)廣大的平臺(tái)中所包含的海量的問(wèn)題與答案對(duì)數(shù)據(jù),也逐漸成為了國(guó)內(nèi)外科研人員的新的研究熱點(diǎn)。此前已經(jīng)有很多論文關(guān)注了社區(qū)問(wèn)答系統(tǒng)場(chǎng)景下的多個(gè)研究課題,而在本學(xué)位論文中,我們關(guān)注的主要任務(wù)是社區(qū)問(wèn)答系統(tǒng)中的答案摘要問(wèn)題。雖然大多數(shù)之前的研究工作主要關(guān)注的是事實(shí)性問(wèn)題,在本學(xué)位論文中,我們的工作重點(diǎn)則是非事實(shí)性問(wèn)題。在事實(shí)性社區(qū)問(wèn)答系統(tǒng)中,問(wèn)題通常是尋求一個(gè)確定的答案,而問(wèn)題的答案大多數(shù)都是單獨(dú)的句子,與之不同的是,非事實(shí)性問(wèn)題往往是在尋求看法、觀點(diǎn)、意見(jiàn),因此,非事實(shí)性問(wèn)題通常需要用多個(gè)句子、甚至是整篇文章來(lái)作為答案。傳統(tǒng)的多文檔摘要任務(wù)主要是針對(duì)新聞文章,與之相比,在非事實(shí)性社區(qū)問(wèn)答系統(tǒng)中的答案摘要就面臨著其獨(dú)特的挑戰(zhàn):答案句子的簡(jiǎn)短性、稀疏性,以及答案內(nèi)容的多樣性。為了解決這些挑戰(zhàn),我們提出了一個(gè)包含了三個(gè)核心要素的、基于稀疏編碼的答案摘要策略:答案句子的短文本擴(kuò)充,句子的向量化表示,以及稀疏編碼優(yōu)化框架。具體來(lái)說(shuō),通過(guò)實(shí)體鏈接和基于問(wèn)題答案句子排序的策略,我們把一個(gè)問(wèn)題下的每一個(gè)答案句子擴(kuò)展成包含了多個(gè)維基百科句子組成的更復(fù)雜的表示。在此基礎(chǔ)之上,每個(gè)句子都通過(guò)一個(gè)基于短文本的卷積神經(jīng)網(wǎng)絡(luò)模型被表示成一個(gè)特征向量。之后我們利用這些句子的向量表示,提出了一個(gè)稀疏編碼的優(yōu)化框架,通過(guò)同時(shí)考慮候選答案句子以及輔助的維基百科句子,來(lái)評(píng)估所有候選句子的獨(dú)特性得分。在得到了這些候選答案句子的獨(dú)特性得分之后,基于最大邊界相關(guān)性算法,我們抽取出得分最高的答案句子,來(lái)產(chǎn)生最終的答案摘要。我們?cè)诒緦W(xué)位論文中的主要貢獻(xiàn)是,通過(guò)處理非事實(shí)性問(wèn)題中答案句子的簡(jiǎn)短性和稀疏性,以及答案內(nèi)容的多樣性這三個(gè)問(wèn)題,我們解決了社區(qū)問(wèn)答系統(tǒng)中非事實(shí)性問(wèn)題的答案摘要問(wèn)題。另外,我們?cè)谝粋(gè)公開(kāi)的基準(zhǔn)數(shù)據(jù)集上進(jìn)行了實(shí)驗(yàn),并與一些當(dāng)下最新的基準(zhǔn)實(shí)驗(yàn)方法進(jìn)行了比較,以評(píng)估我們提出的非事實(shí)性社區(qū)問(wèn)答系統(tǒng)中的答案摘要方法的性能。相關(guān)實(shí)驗(yàn)結(jié)果不僅證實(shí)了我們提出的方法的有效性,而且相較于最新的研究方法,我們提出的方法在ROUGE評(píng)價(jià)指標(biāo)上有著顯著提升。此外,進(jìn)一步的實(shí)驗(yàn)結(jié)果分析,也說(shuō)明了我們提出的算法具有良好的穩(wěn)定性和擴(kuò)展性。
[Abstract]:In recent years, we can see that the number of users in the community Q & a system is showing a rapid growth trend. Community question answering system provides users with a platform for publishing questions and finding answers. Massive problems, answers and data contained in this vast platform have gradually become a new research focus of researchers at home and abroad. Many papers have been concerned about many research topics in the community QA system. In this dissertation, we focus on the answer questions in community question answering system. Although most of the previous work focuses on factual issues, in this degree thesis, our focus is on non factual issues. In fact the community question answering system, the problem is usually to seek a definitive answer, but most answers are separate sentences, and the difference is that the non factual issues are often seeking opinions, views, opinions, therefore, non factual problems usually need to use more than one sentence, even the whole article is the answer. The traditional multi document summarization task is mainly aimed at news articles. Compared with the answer questions in non factual community question answering system, it faces unique challenges: the short sentence, the sparsity of the answer sentences, and the diversity of the answer content. To solve these challenges, we propose a sparse coding based answer summarization strategy which consists of three core elements: short sentence expansion of answer sentences, quantitative representation of sentences, and sparse coding optimization framework. Specifically, through the physical link scheduling problem and strategy based on the answer sentence, we have a problem for every answer sentence is extended to contain multiple Wikipedia sentences to represent more complex. On this basis, each sentence is represented as a feature vector by a convolution neural network model based on short text. Then we use the vector representation of these sentences, put forward the optimization framework of a sparse encoding, by considering the candidate answer sentence and auxiliary Wikipedia sentences to evaluate all candidate sentences unique score. After getting the scores of these candidate answers, we extract the highest scoring answer sentences based on the maximum boundary correlation algorithm to generate the final answer summaries. Our main contribution in this dissertation is to solve the three questions of non fact questions in the community question answering system by dealing with the following questions: short and sparse sentences, and variety of answers. In addition, we conducted experiments on an open benchmark dataset, and compared with some recent benchmark experimental methods to evaluate the performance of the answer summarization method in our non factual community question answering system. The related experimental results not only confirm the effectiveness of our proposed method, but also compare with the latest research methods, our proposed method has a significant improvement in ROUGE evaluation index. In addition, further analysis of experimental results shows that the proposed algorithm has good stability and scalability.
【學(xué)位授予單位】:山東大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP391.1

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 劉瑜;袁健;;基于RTEM模型的問(wèn)答社區(qū)候選答案排序方法[J];電子科技;2016年05期

2 姜雯;許鑫;武高峰;;附加情感特征的在線問(wèn)答社區(qū)信息質(zhì)量自動(dòng)化評(píng)價(jià)[J];圖書情報(bào)工作;2015年04期

3 楊敏;余小萍;鄭宏;;在線問(wèn)答社區(qū)用戶研究綜述[J];圖書館學(xué)研究;2014年14期

4 劉高軍;馬硯忠;段建勇;;社區(qū)問(wèn)答系統(tǒng)中“問(wèn)答對(duì)”的質(zhì)量評(píng)價(jià)[J];北方工業(yè)大學(xué)學(xué)報(bào);2012年03期

5 熊大平;王健;林鴻飛;;一種基于LDA的社區(qū)問(wèn)答問(wèn)句相似度計(jì)算方法[J];中文信息學(xué)報(bào);2012年05期

6 李晨;巢文涵;陳小明;李舟軍;;中文社區(qū)問(wèn)答中問(wèn)題答案質(zhì)量評(píng)價(jià)和預(yù)測(cè)[J];計(jì)算機(jī)科學(xué);2011年06期

7 劉寧鋒;史曉東;;中文問(wèn)答系統(tǒng)中答案抽取的研究[J];電腦知識(shí)與技術(shù);2011年12期

8 王君澤;黃本雄;胡廣;溫杰;;社區(qū)問(wèn)答服務(wù)中的問(wèn)題分類任務(wù)研究[J];計(jì)算機(jī)工程與科學(xué);2011年01期

9 孔維澤;劉奕群;張敏;馬少平;;問(wèn)答社區(qū)中回答質(zhì)量的評(píng)價(jià)方法研究[J];中文信息學(xué)報(bào);2011年01期

10 張中峰;李秋丹;;社區(qū)問(wèn)答系統(tǒng)研究綜述[J];計(jì)算機(jī)科學(xué);2010年11期

相關(guān)會(huì)議論文 前1條

1 李波;邱錫鵬;吳立德;;使用語(yǔ)法分析和統(tǒng)計(jì)方法構(gòu)建問(wèn)答系統(tǒng)的答案排序模型[A];第四屆全國(guó)信息檢索與內(nèi)容安全學(xué)術(shù)會(huì)議論文集(上)[C];2008年

相關(guān)博士學(xué)位論文 前2條

1 孫月萍;基于全信息的社區(qū)問(wèn)答系統(tǒng)研究[D];北京郵電大學(xué);2014年

2 廉鑫;社區(qū)問(wèn)答系統(tǒng)中若干關(guān)鍵問(wèn)題研究[D];南開(kāi)大學(xué);2014年

相關(guān)碩士學(xué)位論文 前3條

1 李吉月;中文社區(qū)問(wèn)答系統(tǒng)中問(wèn)題檢索技術(shù)研究[D];北京理工大學(xué);2016年

2 劉淵杰;社區(qū)問(wèn)答系統(tǒng)最佳回答機(jī)制的研究[D];上海交通大學(xué);2010年

3 鄧昱;中文問(wèn)答系統(tǒng)中的答案抽取算法研究[D];北京郵電大學(xué);2009年

,

本文編號(hào):1341632

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/shoufeilunwen/xixikjs/1341632.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶06839***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com