學(xué)術(shù)文本的結(jié)構(gòu)功能識別——在關(guān)鍵詞自動抽取中的應(yīng)用
發(fā)布時間:2018-09-11 09:19
【摘要】:當(dāng)前的關(guān)鍵詞自動提取研究大多基于候選詞的詞頻、文檔頻率等統(tǒng)計信息,往往忽略了侯選詞所在的學(xué)術(shù)文本的內(nèi)在結(jié)構(gòu),導(dǎo)致關(guān)鍵詞提取的效果不佳。本文將學(xué)術(shù)文本看作是5個結(jié)構(gòu)功能域的集合,提出了融合學(xué)術(shù)文本結(jié)構(gòu)功能特征的多特征組合提取方法,并利用學(xué)術(shù)文本的章節(jié)標(biāo)題對其結(jié)構(gòu)功能進行識別,然后通過SVM二分類和LambdaMART學(xué)習(xí)排序算法分別在計算機語言學(xué)領(lǐng)域的文獻集上進行了實現(xiàn)。實驗結(jié)果表明,本文提出的組合特征方法相比基準(zhǔn)特征在關(guān)鍵詞提取的效果上取得了較大的提升,尤其在分類實驗中準(zhǔn)確率的相對提升上達到10.75%,證明了學(xué)術(shù)文本結(jié)構(gòu)功能特征在關(guān)鍵詞自動提取上的重要性。
[Abstract]:Most of the current research on automatic keyword extraction is based on the statistical information such as word frequency and document frequency of candidate words, which often ignores the internal structure of the academic text in which the candidate words are located, resulting in a poor result of keyword extraction. In this paper, the academic text is regarded as a collection of five structural and functional domains, and a multi-feature combination extraction method is proposed, which combines the structural and functional features of the academic text, and uses the chapter title of the academic text to identify its structure and function. Then, the SVM binary classification and the LambdaMART learning sorting algorithm are implemented on the literature set in the field of computer linguistics. The experimental results show that the combined feature method proposed in this paper has achieved a better result than the benchmark feature in keyword extraction. Especially in the classification experiment, the relative improvement of accuracy is 10.75, which proves the importance of the function feature of academic text structure in the automatic extraction of keywords.
【作者單位】: 武漢大學(xué)信息管理學(xué)院信息檢索與知識挖掘?qū)嶒炈?
【基金】:國家自然科學(xué)基金面上項目“面向詞匯功能的學(xué)術(shù)文本語義識別與知識圖譜構(gòu)建”(71473183);國家自然科學(xué)基金面上項目“基于多語義信息融合的學(xué)術(shù)文獻引文推薦研究”(71673211)
【分類號】:TP391.1
,
本文編號:2236274
[Abstract]:Most of the current research on automatic keyword extraction is based on the statistical information such as word frequency and document frequency of candidate words, which often ignores the internal structure of the academic text in which the candidate words are located, resulting in a poor result of keyword extraction. In this paper, the academic text is regarded as a collection of five structural and functional domains, and a multi-feature combination extraction method is proposed, which combines the structural and functional features of the academic text, and uses the chapter title of the academic text to identify its structure and function. Then, the SVM binary classification and the LambdaMART learning sorting algorithm are implemented on the literature set in the field of computer linguistics. The experimental results show that the combined feature method proposed in this paper has achieved a better result than the benchmark feature in keyword extraction. Especially in the classification experiment, the relative improvement of accuracy is 10.75, which proves the importance of the function feature of academic text structure in the automatic extraction of keywords.
【作者單位】: 武漢大學(xué)信息管理學(xué)院信息檢索與知識挖掘?qū)嶒炈?
【基金】:國家自然科學(xué)基金面上項目“面向詞匯功能的學(xué)術(shù)文本語義識別與知識圖譜構(gòu)建”(71473183);國家自然科學(xué)基金面上項目“基于多語義信息融合的學(xué)術(shù)文獻引文推薦研究”(71673211)
【分類號】:TP391.1
,
本文編號:2236274
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2236274.html
最近更新
教材專著