詞向量聚類加權TextRank的關鍵詞抽取
發(fā)布時間:2019-06-27 18:25
【摘要】:【目的】將維基百科蘊涵的世界知識以詞向量方式融入TextRank模型,改進單文檔關鍵詞抽取效果!痉椒ā坷肳ord2Vec模型基于維基百科中文數(shù)據(jù),生成詞向量模型,對TextRank詞圖節(jié)點的詞向量進行聚類以調(diào)整簇內(nèi)節(jié)點的投票重要性,結合節(jié)點的覆蓋和位置因素,計算節(jié)點之間的隨機跳轉(zhuǎn)概率,生成轉(zhuǎn)移矩陣,最終通過迭代計算獲得節(jié)點的重要性得分,選取前TopN個詞語生成關鍵詞!窘Y果】當TopN≤7時,詞向量聚類加權方法均優(yōu)于對比方法;TopN=3時,F值取得最大值,比先前最優(yōu)結果增量提升了3.374%;TopN7時,結果與位置加權法相似!揪窒蕖烤垲惙治鍪沟糜嬎汩_銷變高!窘Y論】詞向量聚類加權能夠改善關鍵詞抽取效果。
[Abstract]:[objective] to integrate the world knowledge contained in Wikipedia into TextRank model by word vector, and to improve the effect of keyword extraction from single document. [methods] the word vector model is generated based on Wikipedia Chinese data, and the word vector of TextRank word map node is clustering to adjust the voting importance of the nodes in the cluster. Combined with the coverage and location factors of the nodes, the random jump probability between nodes is calculated and the transfer matrix is generated. Finally, the importance score of the node is obtained by iterative calculation, and the former TopN words are selected to generate keywords. [results] when TopN 鈮,
本文編號:2507032
[Abstract]:[objective] to integrate the world knowledge contained in Wikipedia into TextRank model by word vector, and to improve the effect of keyword extraction from single document. [methods] the word vector model is generated based on Wikipedia Chinese data, and the word vector of TextRank word map node is clustering to adjust the voting importance of the nodes in the cluster. Combined with the coverage and location factors of the nodes, the random jump probability between nodes is calculated and the transfer matrix is generated. Finally, the importance score of the node is obtained by iterative calculation, and the former TopN words are selected to generate keywords. [results] when TopN 鈮,
本文編號:2507032
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2507032.html
最近更新
教材專著