基于復(fù)合加權(quán)LDA模型的書目信息分類方法研究
發(fā)布時間:2018-06-16 13:11
本文選題:文本分類 + LDA模型; 參考:《情報學(xué)報》2017年04期
【摘要】:以書目信息為分類對象的自動分類研究對信息資源組織具有重要意義。本文以概率主題模型LDA作為書目信息的文本表示模型,以克服因文本短小而產(chǎn)生的特征稀疏問題;以書目信息的體例結(jié)構(gòu)和所在類目的類別區(qū)分能力分別實現(xiàn)兩種不同的特征加權(quán)策略,在此基礎(chǔ)上構(gòu)建復(fù)合加權(quán)策略,使獲取的特征詞集既不向高頻詞傾斜,也更能代表書目信息的所屬類別。將復(fù)合加權(quán)策略融合于LDA、提出一種基于復(fù)合加權(quán)LDA的書目信息分類方法。使用公開和自建的書目信息語料進行對比實驗,驗證和分析復(fù)合加權(quán)策略的有效性,實驗顯示本文提出的復(fù)合加權(quán)LDA分類方法的分類性能優(yōu)于僅考慮其中一種特征加權(quán)策略的LDA分類方法。
[Abstract]:The automatic classification study of bibliographic information is of great significance to the organization of information resources. In this paper, the probability theme model LDA is used as the text representation model of bibliographic information to overcome the feature sparsity caused by short text, and the classification ability of the bibliographic information and the classification ability of the category is two respectively. On the basis of different feature weighting strategies, a compound weighting strategy is constructed so that the acquired feature words are not inclined to high frequency words, and they can also represent the category of bibliographic information. The composite weighting strategy is fused to LDA, and a bibliographic information classification method based on the compound weighted LDA is proposed. The comparison experiment is carried out to verify and analyze the effectiveness of the combined weighted strategy. The experiment shows that the classification performance of the combined weighted LDA classification method proposed in this paper is better than the LDA classification method which only considers one of the feature weighted strategies.
【作者單位】: 武漢大學(xué)信息管理學(xué)院;
【基金】:國家社會科學(xué)基金項目“多種類型文本數(shù)字資源自動分類研究”(15BTQ066)
【分類號】:TP391.1
【相似文獻】
相關(guān)期刊論文 前1條
1 中本,內(nèi)藤,張希軒;日本書目信息交換標準[J];現(xiàn)代圖書情報技術(shù);1985年04期
,本文編號:2026819
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2026819.html
最近更新
教材專著