天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 軟件論文 >

一種基于動(dòng)態(tài)詞匯表的在線LDA算法

發(fā)布時(shí)間:2018-11-07 11:33
【摘要】:目前的在線潛在狄利克雷分布模型(LDA)算法大多是基于固定的詞匯表,在實(shí)際應(yīng)用中經(jīng)常會(huì)出現(xiàn)詞匯表和處理的語料不匹配的情況,影響了模型的實(shí)用性。針對(duì)這個(gè)現(xiàn)象,在置信傳播算法(BP)的框架下,使主題單詞分布服從狄利克雷過程,重新推導(dǎo)公式,使得詞匯表在模型運(yùn)行之前為空,并且在處理時(shí)不斷向詞匯表中增加發(fā)現(xiàn)的新詞。實(shí)驗(yàn)證明,這種新的基于動(dòng)態(tài)詞匯表的算法不僅使得詞匯表與語料的貼合度更高,而且使其在混淆度以及互信息指數(shù)這兩個(gè)指標(biāo)上能夠比基于固定詞匯表的LDA模型表現(xiàn)得更加優(yōu)越。
[Abstract]:At present, most of the online potential Delikley distribution model (LDA) algorithms are based on a fixed vocabulary, and the mismatch between the vocabulary and the processed corpus often occurs in practical applications, which affects the practicability of the model. In order to solve this problem, under the framework of confidence propagation algorithm (BP), we rederive the formula from the Delikley process to make the vocabulary empty before the model runs. And in the processing of the vocabulary to continue to add new words found. Experimental results show that the new algorithm based on dynamic vocabulary not only makes the consistency of vocabulary and corpus higher, Moreover, it is superior to the LDA model based on fixed vocabulary in terms of the degree of confusion and mutual information index.
【作者單位】: 蘇州大學(xué)計(jì)算機(jī)科學(xué)與技術(shù)學(xué)院;
【基金】:國(guó)家自然科學(xué)基金(61373092,61572339,61272449) 江蘇省科技支撐計(jì)劃重點(diǎn)項(xiàng)目(BE2014005)資助
【分類號(hào)】:TP391.1
,

本文編號(hào):2316236

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2316236.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶85118***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com