微博話題檢測(cè)與跟蹤方法研究
[Abstract]:Weibo, as one of the most popular social applications, has become the main way for people to obtain and disseminate information. Weibo data is in fact a high-speed, massive and dynamic information flow, which can express the social topic and its changing process at every moment, from which to conduct topic detection and track the supervision of public opinion, public opinion survey has great significance. Under this background, this paper proposes a clustering algorithm which can deal with large scale data streams, and applies it to Weibo topic detection and tracking, and achieves good results. A large scale data stream clustering method based on nearest neighbor propagation (Affinity Propagation in Massive Data Stream,APMStream) is proposed, which includes four parts: initial clustering, online clustering, clustering adjustment and clustering maintenance. The nearest neighbor propagation (Affinity Propagation,AP) algorithm is improved from two aspects of distributed iteration and dynamically adjusting damping coefficient to make it suitable for the initial clustering of large-scale data. Online clustering can process each tuple in real time and merge the tuple into the cluster or create a new cluster according to the distance from the existing clustering. Firstly, the clustering center is re-selected, and then the new clustering center is clustered by using the weighted AP algorithm. Cluster maintenance maintains the system load within a reasonable range by deleting clusters that have not been updated for a long time and tuples of low importance. The APMStream method is applied to topic detection and tracking, mainly including the measurement of Weibo's importance and the calculation of the distance between Weibo. As the priority parameter of AP algorithm, the probability of Weibo becoming the center of clustering is determined. The distance between Weibo is calculated based on the common lexical block method, which is used to construct the similarity matrix of the AP algorithm. The APMStream method is designed as a topology of the distributed flow processing framework (Apache Storm). Data processing is distributed across the nodes of the topology. The experimental results show that APMStream can deal with the large-scale Weibo data flow quickly, detect the topic of Weibo, and reflect the evolution of the topic with time.
【學(xué)位授予單位】:華中科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2016
【分類號(hào)】:TP391.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 孫莉;張振;李繼云;王梅;;基于微博文本和元數(shù)據(jù)的話題檢測(cè)[J];計(jì)算機(jī)應(yīng)用與軟件;2016年03期
2 曹文琴;黃玉軍;涂國(guó)平;;微博話題傳播的時(shí)間網(wǎng)絡(luò)影響力模型研究[J];圖書(shū)情報(bào)工作;2016年01期
3 黃賢英;陳紅陽(yáng);劉英濤;;短文本相似度研究及其在微博話題檢測(cè)中的應(yīng)用[J];計(jì)算機(jī)工程與設(shè)計(jì);2015年11期
4 劉季;陳秀宏;杭文龍;;面向大規(guī)模數(shù)據(jù)的快速多代表點(diǎn)仿射傳播算法[J];計(jì)算機(jī)科學(xué)與探索;2016年02期
5 陳羽中;方明月;郭文忠;;面向微博熱點(diǎn)話題發(fā)現(xiàn)的多標(biāo)簽傳播聚類方法研究[J];模式識(shí)別與人工智能;2015年01期
6 慶艷華;左小德;;考慮服務(wù)懲罰的配送中心選址的雙層規(guī)劃模型[J];華南理工大學(xué)學(xué)報(bào)(社會(huì)科學(xué)版);2014年03期
7 張建朋;陳福才;李邵梅;劉力雄;;基于仿射傳播的進(jìn)化數(shù)據(jù)流在線聚類算法[J];模式識(shí)別與人工智能;2014年05期
8 王金明;王遠(yuǎn)方;;基于Twitter Storm平臺(tái)并行挖掘最稠密子圖[J];計(jì)算機(jī)科學(xué);2014年01期
9 王勇;肖詩(shī)斌;郭嵡秀;呂學(xué)強(qiáng);;中文微博突發(fā)事件檢測(cè)研究[J];現(xiàn)代圖書(shū)情報(bào)技術(shù);2013年02期
10 童薇;陳威;孟小峰;;EDM:高效的微博事件檢測(cè)算法[J];計(jì)算機(jī)科學(xué)與探索;2012年12期
相關(guān)碩士學(xué)位論文 前1條
1 黃軍;社交網(wǎng)絡(luò)熱點(diǎn)話題公眾情感極性實(shí)時(shí)計(jì)算研究[D];杭州電子科技大學(xué);2015年
,本文編號(hào):2344900
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2344900.html