學術文本的結構功能識別——在關鍵詞自動抽取中的應用
發(fā)布時間:2018-09-11 09:19
【摘要】:當前的關鍵詞自動提取研究大多基于候選詞的詞頻、文檔頻率等統(tǒng)計信息,往往忽略了侯選詞所在的學術文本的內在結構,導致關鍵詞提取的效果不佳。本文將學術文本看作是5個結構功能域的集合,提出了融合學術文本結構功能特征的多特征組合提取方法,并利用學術文本的章節(jié)標題對其結構功能進行識別,然后通過SVM二分類和LambdaMART學習排序算法分別在計算機語言學領域的文獻集上進行了實現。實驗結果表明,本文提出的組合特征方法相比基準特征在關鍵詞提取的效果上取得了較大的提升,尤其在分類實驗中準確率的相對提升上達到10.75%,證明了學術文本結構功能特征在關鍵詞自動提取上的重要性。
[Abstract]:Most of the current research on automatic keyword extraction is based on the statistical information such as word frequency and document frequency of candidate words, which often ignores the internal structure of the academic text in which the candidate words are located, resulting in a poor result of keyword extraction. In this paper, the academic text is regarded as a collection of five structural and functional domains, and a multi-feature combination extraction method is proposed, which combines the structural and functional features of the academic text, and uses the chapter title of the academic text to identify its structure and function. Then, the SVM binary classification and the LambdaMART learning sorting algorithm are implemented on the literature set in the field of computer linguistics. The experimental results show that the combined feature method proposed in this paper has achieved a better result than the benchmark feature in keyword extraction. Especially in the classification experiment, the relative improvement of accuracy is 10.75, which proves the importance of the function feature of academic text structure in the automatic extraction of keywords.
【作者單位】: 武漢大學信息管理學院信息檢索與知識挖掘實驗所;
【基金】:國家自然科學基金面上項目“面向詞匯功能的學術文本語義識別與知識圖譜構建”(71473183);國家自然科學基金面上項目“基于多語義信息融合的學術文獻引文推薦研究”(71673211)
【分類號】:TP391.1
,
本文編號:2236274
[Abstract]:Most of the current research on automatic keyword extraction is based on the statistical information such as word frequency and document frequency of candidate words, which often ignores the internal structure of the academic text in which the candidate words are located, resulting in a poor result of keyword extraction. In this paper, the academic text is regarded as a collection of five structural and functional domains, and a multi-feature combination extraction method is proposed, which combines the structural and functional features of the academic text, and uses the chapter title of the academic text to identify its structure and function. Then, the SVM binary classification and the LambdaMART learning sorting algorithm are implemented on the literature set in the field of computer linguistics. The experimental results show that the combined feature method proposed in this paper has achieved a better result than the benchmark feature in keyword extraction. Especially in the classification experiment, the relative improvement of accuracy is 10.75, which proves the importance of the function feature of academic text structure in the automatic extraction of keywords.
【作者單位】: 武漢大學信息管理學院信息檢索與知識挖掘實驗所;
【基金】:國家自然科學基金面上項目“面向詞匯功能的學術文本語義識別與知識圖譜構建”(71473183);國家自然科學基金面上項目“基于多語義信息融合的學術文獻引文推薦研究”(71673211)
【分類號】:TP391.1
,
本文編號:2236274
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2236274.html