詞向量聚類加權(quán)TextRank的關(guān)鍵詞抽取

發(fā)布時間：2019-06-27 18:25

【摘要】：【目的】將維基百科蘊涵的世界知識以詞向量方式融入TextRank模型,改進單文檔關(guān)鍵詞抽取效果�！痉椒ā坷肳ord2Vec模型基于維基百科中文數(shù)據(jù),生成詞向量模型,對TextRank詞圖節(jié)點的詞向量進行聚類以調(diào)整簇內(nèi)節(jié)點的投票重要性,結(jié)合節(jié)點的覆蓋和位置因素,計算節(jié)點之間的隨機跳轉(zhuǎn)概率,生成轉(zhuǎn)移矩陣,最終通過迭代計算獲得節(jié)點的重要性得分,選取前TopN個詞語生成關(guān)鍵詞�！窘Y(jié)果】當TopN≤7時,詞向量聚類加權(quán)方法均優(yōu)于對比方法;TopN=3時,F值取得最大值,比先前最優(yōu)結(jié)果增量提升了3.374%;TopN7時,結(jié)果與位置加權(quán)法相似�！揪窒蕖烤垲惙治鍪沟糜嬎汩_銷變高。【結(jié)論】詞向量聚類加權(quán)能夠改善關(guān)鍵詞抽取效果。
[Abstract]:[objective] to integrate the world knowledge contained in Wikipedia into TextRank model by word vector, and to improve the effect of keyword extraction from single document. [methods] the word vector model is generated based on Wikipedia Chinese data, and the word vector of TextRank word map node is clustering to adjust the voting importance of the nodes in the cluster. Combined with the coverage and location factors of the nodes, the random jump probability between nodes is calculated and the transfer matrix is generated. Finally, the importance score of the node is obtained by iterative calculation, and the former TopN words are selected to generate keywords. [results] when TopN 鈮，

本文編號：2507032

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2507032.html

上一篇：基于工業(yè)以太網(wǎng)的RFID技術(shù)在汽車焊裝車間的應(yīng)用
下一篇：基于情境感知的移動購物應(yīng)用設(shè)計研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

詞向量聚類加權(quán)TextRank的關(guān)鍵詞抽取