融合句義特征的多文檔自動(dòng)摘要算法研究
發(fā)布時(shí)間:2019-04-08 13:49
【摘要】:研究是自然語言處理領(lǐng)域的關(guān)鍵問題之一,為使抽取的摘要更能體現(xiàn)多文檔主題,本文在子主題劃分的基礎(chǔ)上,提出了一種融合句義特征的句子優(yōu)化選擇方法.該方法基于句義結(jié)構(gòu)模型,提取句義結(jié)構(gòu)中的話題、謂詞等特征,并融合統(tǒng)計(jì)特征構(gòu)造特征向量計(jì)算句子權(quán)重,最后采用綜合加權(quán)選取法和最大邊緣相關(guān)相結(jié)合的方法抽取摘要.選取不同主題的文本集進(jìn)行實(shí)驗(yàn)和評(píng)價(jià),在摘要壓縮比為15%情況下,系統(tǒng)摘要平均準(zhǔn)確率達(dá)到66.7%,平均召回率達(dá)到65.5%.實(shí)驗(yàn)結(jié)果表明句義特征的引入可以有效提升多文檔摘要的效果.
[Abstract]:Research is one of the key issues in the field of natural language processing. In order to make abstracts more representative of multi-document themes, this paper proposes a sentence optimization selection method based on sub-topic division, which integrates sentence meaning features. Based on the sentence meaning structure model, this method extracts the topic, predicate and other features of the sentence meaning structure, and combines the statistical features to construct the feature vector to calculate the sentence weight. Finally, a combination of comprehensive weighted selection method and maximum edge correlation method is used to extract the abstracts. When the compression ratio of abstracts is 15%, the average accuracy of abstracts is 66.7% and the average recall rate is 65.5%. The experimental results show that the introduction of sentence semantic features can effectively improve the effect of multi-document abstracts.
【作者單位】: 北京理工大學(xué)信息與電子學(xué)院;
【基金】:國家“二四二”資助項(xiàng)目(2005C48) 北京理工大學(xué)科技創(chuàng)新計(jì)劃重大項(xiàng)目培育專項(xiàng)資助項(xiàng)目(2011CX01015)
【分類號(hào)】:TP391.1
本文編號(hào):2454629
[Abstract]:Research is one of the key issues in the field of natural language processing. In order to make abstracts more representative of multi-document themes, this paper proposes a sentence optimization selection method based on sub-topic division, which integrates sentence meaning features. Based on the sentence meaning structure model, this method extracts the topic, predicate and other features of the sentence meaning structure, and combines the statistical features to construct the feature vector to calculate the sentence weight. Finally, a combination of comprehensive weighted selection method and maximum edge correlation method is used to extract the abstracts. When the compression ratio of abstracts is 15%, the average accuracy of abstracts is 66.7% and the average recall rate is 65.5%. The experimental results show that the introduction of sentence semantic features can effectively improve the effect of multi-document abstracts.
【作者單位】: 北京理工大學(xué)信息與電子學(xué)院;
【基金】:國家“二四二”資助項(xiàng)目(2005C48) 北京理工大學(xué)科技創(chuàng)新計(jì)劃重大項(xiàng)目培育專項(xiàng)資助項(xiàng)目(2011CX01015)
【分類號(hào)】:TP391.1
【相似文獻(xiàn)】
相關(guān)期刊論文 前1條
1 羅森林;韓磊;潘麗敏;馮揚(yáng);劉盈盈;;漢語句義結(jié)構(gòu)模型及其驗(yàn)證[J];北京理工大學(xué)學(xué)報(bào);2013年02期
相關(guān)會(huì)議論文 前1條
1 周祖亮;;“今”字注釋芻議[A];中華中醫(yī)藥學(xué)會(huì)全國第十七屆醫(yī)古文學(xué)術(shù)研討會(huì)論文集[C];2008年
相關(guān)碩士學(xué)位論文 前1條
1 鄒麗麗;融合句義特征的事件關(guān)系強(qiáng)度計(jì)算方法研究[D];北京理工大學(xué);2015年
,本文編號(hào):2454629
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2454629.html
最近更新
教材專著