基于中文微博的熱門話題提取與追蹤
[Abstract]:Since its launch, Weibo has changed the way people get news and get to know current events with its wide participation. In recent years, many breaking news and hot topics have been released through the microblogging platform, and its speed and scope of transmission are incomparable with traditional media. At present, only Sina Weibo is daily. The number of Posts has reached hundreds of millions. These huge amounts of data cover all aspects of people's lives and contain a lot of valuable topic information. If we can extract these hot topics correctly, it is of great significance for us to understand the latest hot topics of public opinion and grasp the trend of public opinion. However, in the face of this magnitude of data, we only rely on people. It is far from enough to process microblog posts. In addition, microblog posts are short texts and have very serious data sparsity. Some traditional topic extraction and tracking algorithms can not be directly used for processing. An improved topic extraction model MF-LDA (Microblog Features Latent Dirichlet Allocation) is proposed to extract hot topics from microblogs. This model improves the traditional LDA (Latent Dirichlet Allocation) model by combining the five unique features of microblogs: praise, comment, forwarding, posting time and user authority. Among them, praise number, forwarding number and comment number are used to calculate the attention of micro-blog, user authority is used to calculate the authoritative value of micro-blog, and then divide the micro-blog into corresponding time windows according to the posting time, and then count the word frequency of micro-blog posts in each time window. The higher the probability, the more likely the word is to be a hot topic. 2. This paper traces the hot topic mainly from the structure and content aspects. To track the topic structure, this paper firstly constructs the Hot Topic Life Cycle Model (HTLCM), and divides the topic life cycle into five stages: birth, growth, maturity, decline and disappearance, by calculating the number of topics in a unit time, the growth rate. This paper integrates MF-LDA model with HTLCM model and proposes an HTT (Hot Topic Tracking Algorithm) algorithm for tracking the topic content. In the time window, the candidate hot topics marked by the HTLCM model are allocated to the corresponding time window according to the publishing time, and then the data of each time window is input into the MF-LDA model, so that the most relevant keywords of the hot topic in each time window can be obtained. By analyzing the changes of the keywords, the key words can be obtained. Finally, in order to verify the validity of the proposed model and algorithm, experiments and analysis are carried out on real data sets. The experimental results show that the Perplexity (perplexity) of MF-LDA model under the same conditions is lower than that of LDA model, but the coverage rate of MF-LDA model is higher than that of LDA model. The algorithm can not only keep track of hot topics, but also find potential hot topics effectively. The experimental results show that the proposed model and method have good effect and practical significance in Hot Topics Extraction and tracking.
【學位授予單位】:西華大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP391.1;TP393.092
【相似文獻】
相關期刊論文 前10條
1 林志;淺談熱門話題的采寫[J];新聞與寫作;1991年12期
2 董惠君;談熱門話題節(jié)目[J];視聽界;1995年04期
3 N.K-D.;;報界熱門話題[J];科技潮;1998年05期
4 楊旭東;熱門話題的談法分析──以“知識經(jīng)濟與高等教育”話題為例[J];現(xiàn)代傳播-北京廣播學院學報;1999年05期
5 王強;;構(gòu)建企業(yè)完整的知識體系[J];中國計算機用戶;2008年Z2期
6 阿昆;;企業(yè)重組話檔案[之一][J];北京檔案;2007年03期
7 金順榮;談思辨在熱門話題中的運用[J];新聞前哨;1999年02期
8 許浚;公司治理與企業(yè)發(fā)展[J];通信企業(yè)管理;2005年12期
9 ;知識經(jīng)濟——當今熱門話題(上)[J];電腦知識;1998年09期
10 張群;承諾什么[J];中國郵政;1997年07期
相關會議論文 前2條
1 胡萬地;姚偉;;構(gòu)建和諧企業(yè)之管見[A];落實科學發(fā)展觀 構(gòu)建和諧社會——第十一屆浙江省經(jīng)營管理大師風采及浙江省經(jīng)營管理研究會2005年年會論文匯編[C];2005年
2 劉春林;馬英姿;;思維向微觀延伸苦練內(nèi)功工作從基礎入手建立現(xiàn)代企業(yè)制度[A];現(xiàn)代企業(yè)運行機制與思維創(chuàng)新——企業(yè)運行機制與思維創(chuàng)新研討會議論文[C];2003年
相關重要報紙文章 前10條
1 沈瑩;“家庭話題研討”催生文明風尚[N];中國婦女報;2007年
2 本報記者 房琳琳 趙英淑;聚焦2006兩會熱門話題[N];科技日報;2006年
3 記者 毛麗萍;“全民創(chuàng)業(yè)”成武漢市政協(xié)全會熱門話題[N];人民政協(xié)報;2008年
4 記者 莫瑞寧;穩(wěn)定就業(yè) 共同擔當責任[N];西安日報;2009年
5 記者 劉云山;消費賬單成為熱門話題[N];中國郵政報;2005年
6 秦玉龍;3.15 消費維權再度成為熱門話題[N];平?jīng)鋈請?2006年
7 記者 陳楓 雷輝;政府要關心民工的“被窩”[N];南方日報;2010年
8 記者 趙鵬 張建高;熱門話題冷靜思考[N];新華每日電訊;2002年
9 本報記者 白槐;津津樂道 熱門話題[N];中國旅游報;2001年
10 ;IPv6、移動性和SIP成為熱門話題[N];人民郵電;2006年
相關碩士學位論文 前10條
1 葉永濤;基于中文微博的熱門話題提取與追蹤[D];西華大學;2017年
2 張萌;關于新浪微博熱門話題的分析研究[D];山東大學;2015年
3 陳靜;微博熱門話題及其線下行為轉(zhuǎn)化研究[D];華中科技大學;2015年
4 李新娟;微博熱門話題意義生成的符號學分析[D];西北師范大學;2012年
5 楊丹丹;論新浪微博熱門話題的傳播[D];東北師范大學;2012年
6 劉璐;面向微博熱門話題的主客觀分類方法研究[D];山西大學;2013年
7 張文汐;新浪微博熱門話題的特點與規(guī)律研究[D];遼寧大學;2014年
8 趙紅運;基于用戶活躍度和熱門話題的微博社區(qū)推薦技術研究[D];蘭州交通大學;2014年
9 張躍偉;基于微博客話題的熱點預測及傳播溯源[D];北京郵電大學;2014年
10 王征勇;微博平臺的熱門話題檢測[D];浙江大學;2013年
,本文編號:2178078
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2178078.html