一種基于動(dòng)態(tài)詞匯表的在線LDA算法
發(fā)布時(shí)間:2018-11-07 11:33
【摘要】:目前的在線潛在狄利克雷分布模型(LDA)算法大多是基于固定的詞匯表,在實(shí)際應(yīng)用中經(jīng)常會(huì)出現(xiàn)詞匯表和處理的語料不匹配的情況,影響了模型的實(shí)用性。針對(duì)這個(gè)現(xiàn)象,在置信傳播算法(BP)的框架下,使主題單詞分布服從狄利克雷過程,重新推導(dǎo)公式,使得詞匯表在模型運(yùn)行之前為空,并且在處理時(shí)不斷向詞匯表中增加發(fā)現(xiàn)的新詞。實(shí)驗(yàn)證明,這種新的基于動(dòng)態(tài)詞匯表的算法不僅使得詞匯表與語料的貼合度更高,而且使其在混淆度以及互信息指數(shù)這兩個(gè)指標(biāo)上能夠比基于固定詞匯表的LDA模型表現(xiàn)得更加優(yōu)越。
[Abstract]:At present, most of the online potential Delikley distribution model (LDA) algorithms are based on a fixed vocabulary, and the mismatch between the vocabulary and the processed corpus often occurs in practical applications, which affects the practicability of the model. In order to solve this problem, under the framework of confidence propagation algorithm (BP), we rederive the formula from the Delikley process to make the vocabulary empty before the model runs. And in the processing of the vocabulary to continue to add new words found. Experimental results show that the new algorithm based on dynamic vocabulary not only makes the consistency of vocabulary and corpus higher, Moreover, it is superior to the LDA model based on fixed vocabulary in terms of the degree of confusion and mutual information index.
【作者單位】: 蘇州大學(xué)計(jì)算機(jī)科學(xué)與技術(shù)學(xué)院;
【基金】:國(guó)家自然科學(xué)基金(61373092,61572339,61272449) 江蘇省科技支撐計(jì)劃重點(diǎn)項(xiàng)目(BE2014005)資助
【分類號(hào)】:TP391.1
,
本文編號(hào):2316236
[Abstract]:At present, most of the online potential Delikley distribution model (LDA) algorithms are based on a fixed vocabulary, and the mismatch between the vocabulary and the processed corpus often occurs in practical applications, which affects the practicability of the model. In order to solve this problem, under the framework of confidence propagation algorithm (BP), we rederive the formula from the Delikley process to make the vocabulary empty before the model runs. And in the processing of the vocabulary to continue to add new words found. Experimental results show that the new algorithm based on dynamic vocabulary not only makes the consistency of vocabulary and corpus higher, Moreover, it is superior to the LDA model based on fixed vocabulary in terms of the degree of confusion and mutual information index.
【作者單位】: 蘇州大學(xué)計(jì)算機(jī)科學(xué)與技術(shù)學(xué)院;
【基金】:國(guó)家自然科學(xué)基金(61373092,61572339,61272449) 江蘇省科技支撐計(jì)劃重點(diǎn)項(xiàng)目(BE2014005)資助
【分類號(hào)】:TP391.1
,
本文編號(hào):2316236
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2316236.html
最近更新
教材專著