基于知網(wǎng)語(yǔ)義特征擴(kuò)展的題名信息分類
發(fā)布時(shí)間:2018-05-16 10:04
本文選題:期刊論文題名 + 短文本分類; 參考:《圖書(shū)館雜志》2017年02期
【摘要】:本文利用文本集內(nèi)部的語(yǔ)義關(guān)聯(lián)性,通過(guò)高頻詞和隱含主題兩個(gè)不同粒度得到訓(xùn)練集的語(yǔ)義核心詞集,然后將知網(wǎng)作為外部資源計(jì)算語(yǔ)義核心詞集與測(cè)試集中特征詞之間的相似度,將訓(xùn)練集中相似度大于某一閾值的特征詞擴(kuò)展到僅有題名作為內(nèi)容的待分類文本中,最后用SVM算法進(jìn)行分類。實(shí)驗(yàn)結(jié)果表明,在訓(xùn)練集與測(cè)試集僅為題名的情況下,當(dāng)訓(xùn)練集為每類200篇時(shí),提升效果最好,達(dá)到3.1%,但提升效果隨訓(xùn)練集文本數(shù)的增加而下降;在訓(xùn)練集為題名加摘要,測(cè)試集為題名時(shí),本文提出的分類算法在復(fù)旦語(yǔ)料和自建的期刊語(yǔ)料上的Macro_F1分別平均提高1.5%和3.1%,在Micro_F1上分別平均提高2.3%和5.3%。本文通過(guò)對(duì)特征稀疏的題名信息進(jìn)行特征擴(kuò)展,以期提高期刊論文題名的分類效果。
[Abstract]:In this paper, the semantic core word set of the training set is obtained by using the semantic relevance within the text set and two different granularity of high-frequency words and implicit topics. Then, the knowledge net is used as the similarity between the core semantic words set of external resources and the feature words in the test set, and the feature words whose similarity in training set is greater than a certain threshold are extended to the text to be classified with only the title of the title as the content. Finally, SVM algorithm is used to classify. The experimental results show that when the training set and the test set are only the title of the question, when the training set is 200 articles per class, the lifting effect is the best, reaching 3.1, but the lifting effect decreases with the increase of the text number of the training set. Under the title of the test set, the Macro_F1 of Fudan corpus and self-built periodical corpus are increased by 1.5% and 3.1% on average, and by 2.3% and 5.3% on Micro_F1, respectively. In order to improve the classification effect of the title of journal papers, this paper extends the sparse feature information of title.
【作者單位】: 武漢大學(xué)信息管理學(xué)院;武漢大學(xué)信息資源研究中心;
【基金】:社會(huì)科學(xué)基金項(xiàng)目“多種類型文本數(shù)字資源自動(dòng)分類研究”(項(xiàng)目編號(hào):15BTQ066)的研究成果之一
【分類號(hào)】:TP391.1
,
本文編號(hào):1896422
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/1896422.html
最近更新
教材專著