結(jié)合詞向量和聚類算法的新聞評論話題演進分析
發(fā)布時間:2018-10-29 18:53
【摘要】:話題演進分析主要是挖掘話題內(nèi)容隨著時間流的演進情況。話題的內(nèi)容可用關(guān)鍵詞來表示。利用word2vec對75萬篇新聞和微博文本進行訓練,得到詞向量模型。將文本流處理后輸入模型,獲得時間序列下所有詞匯的詞向量,利用K-means對詞向量進行聚類,從而實現(xiàn)話題關(guān)鍵詞的抽取。實驗對比了基于PLSA和LDA主題模型下的話題抽取效果,發(fā)現(xiàn)本文的話題分析效果優(yōu)于主題模型的方法。同時,采集足夠大量、內(nèi)容足夠豐富的語料,可訓練得到泛化能力比較強的模型,有利于實時話題演進分析研究工作。
[Abstract]:The analysis of topic evolution is mainly to excavate the evolution of topic content with time flow. The content of the topic can be expressed by key words. Using word2vec to train 750000 news articles and Weibo texts, a word vector model is obtained. After the text flow is processed, the word vectors of all the words in the time series are obtained, and the word vectors are clustered by K-means to extract the topic keywords. The results of topic extraction based on PLSA and LDA are compared, and it is found that the method of topic analysis is better than that of topic model. At the same time, we can train the model with strong generalization ability by collecting enough data and abundant data, which is beneficial to the research work of real-time topic evolution analysis.
【作者單位】: 廣東外語外貿(mào)大學語言工程與計算實驗室;廣東外語外貿(mào)大學思科信息學院;
【基金】:國家社科基金項目(12BYY045) 廣東省哲學社會科學“十二五”規(guī)劃項目(GD15YTS01)
【分類號】:TP391.1
本文編號:2298510
[Abstract]:The analysis of topic evolution is mainly to excavate the evolution of topic content with time flow. The content of the topic can be expressed by key words. Using word2vec to train 750000 news articles and Weibo texts, a word vector model is obtained. After the text flow is processed, the word vectors of all the words in the time series are obtained, and the word vectors are clustered by K-means to extract the topic keywords. The results of topic extraction based on PLSA and LDA are compared, and it is found that the method of topic analysis is better than that of topic model. At the same time, we can train the model with strong generalization ability by collecting enough data and abundant data, which is beneficial to the research work of real-time topic evolution analysis.
【作者單位】: 廣東外語外貿(mào)大學語言工程與計算實驗室;廣東外語外貿(mào)大學思科信息學院;
【基金】:國家社科基金項目(12BYY045) 廣東省哲學社會科學“十二五”規(guī)劃項目(GD15YTS01)
【分類號】:TP391.1
【相似文獻】
相關(guān)會議論文 前2條
1 周小亮;;和諧視角下西方主流經(jīng)濟理論對體制改革績效評價的理論演進分析[A];中華外國經(jīng)濟學說研究會第19次年會暨外國經(jīng)濟學說與國內(nèi)外經(jīng)濟發(fā)展新格局(會議文集)[C];2011年
2 王言鑫;;基于投影尋蹤-協(xié)調(diào)度模型的社會經(jīng)濟復合系統(tǒng)可持續(xù)發(fā)展能力演進分析——以大連市為例[A];地理學與生態(tài)文明建設(shè)——中國地理學會2008年學術(shù)年會論文摘要集[C];2008年
相關(guān)碩士學位論文 前1條
1 于兆永;生產(chǎn)性服務(wù)業(yè)集聚演進分析[D];重慶工商大學;2012年
,本文編號:2298510
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2298510.html
最近更新
教材專著