融合多特征的TextRank關(guān)鍵詞抽取方法

發(fā)布時(shí)間：2018-05-20 23:19

本文選題：TextRank算法 + 關(guān)鍵詞抽取��；參考：《情報(bào)雜志》2017年08期

【摘要】：[目的/意義]關(guān)鍵詞提取在自然語言處理領(lǐng)域有著廣泛的應(yīng)用,如何快速準(zhǔn)確地實(shí)現(xiàn)關(guān)鍵詞的提取已經(jīng)成為文本處理的關(guān)鍵問題。目前關(guān)鍵詞提取方法非常多,但準(zhǔn)確率仍有待提升。為此,提出一種結(jié)合單一文檔內(nèi)部結(jié)構(gòu)信息、詞語對于單文檔和文檔集整體的重要性的關(guān)鍵詞抽取方法。[方法/過程]首先,根據(jù)詞語的平均信息熵特征計(jì)算詞語對文檔集整體的重要性,利用詞語的詞性、位置特征計(jì)算詞語對單文檔中的重要性。然后,通過神經(jīng)網(wǎng)絡(luò)訓(xùn)練的方式優(yōu)化三個(gè)特征的權(quán)重分配實(shí)現(xiàn)特征的融合。最后,利用三個(gè)特征計(jì)算得到詞語的綜合權(quán)值來改進(jìn)TextRank模型詞匯節(jié)點(diǎn)的初始權(quán)重以及概率轉(zhuǎn)移矩陣,再通過迭代法實(shí)現(xiàn)關(guān)鍵詞的抽取。[結(jié)果 /結(jié)論]該研究方法結(jié)合了文檔集整體信息和單文檔自身信息,其關(guān)鍵詞提取的準(zhǔn)確率較傳統(tǒng)TextRank方法、TFIDF-TextRank方法有了明顯的提高。
[Abstract]:Objective / meaning keyword extraction is widely used in the field of natural language processing. How to extract keywords quickly and accurately has become a key problem in text processing. At present, there are many methods of keyword extraction, but the accuracy still needs to be improved. This paper proposes a keyword extraction method which combines the internal structure information of a single document and the importance of words to the whole of a single document and a set of documents. [method / process] first, the importance of words to the whole document set is calculated according to the average information entropy feature of words, and the importance of words to a single document is calculated by using the word's part of speech and location feature. Then, the weights of the three features are optimized by neural network training to achieve feature fusion. Finally, the synthetic weights of the words are calculated by using three features to improve the initial weight and the probability transfer matrix of the lexical nodes in the TextRank model, and then the keyword extraction is realized by iterative method. [results / conclusion] this method combines the whole information of document set and the information of single document itself, and the accuracy of keyword extraction is much higher than that of the traditional TextRank method (TFIDF-TextRank).
【作者單位】：廣東工業(yè)大學(xué)計(jì)算機(jī)學(xué)院;廣東工業(yè)大學(xué)藝術(shù)與設(shè)計(jì)學(xué)院;
【基金】：廣東省部產(chǎn)學(xué)研專項(xiàng)資金企業(yè)創(chuàng)新平臺(tái)“面向家電行業(yè)的用戶數(shù)據(jù)挖掘系統(tǒng)研究及體驗(yàn)式設(shè)計(jì)創(chuàng)新服務(wù)”(編號(hào):2013B090800042)
【分類號(hào)】：TP391.1

【相似文獻(xiàn)】

相關(guān)期刊論文前1條

1 夏天;;詞語位置加權(quán)TextRank的關(guān)鍵詞抽取研究[J];現(xiàn)代圖書情報(bào)技術(shù);2013年09期

，

本文編號(hào)：1916683

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1916683.html

上一篇：上下文感知的移動(dòng)用戶新聞偏好獲取及推薦算法研究
下一篇：一種基于角色和屬性的云計(jì)算數(shù)據(jù)訪問控制模型

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

融合多特征的TextRank關(guān)鍵詞抽取方法