天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁(yè) > 科技論文 > 軟件論文 >

微博話題檢測(cè)與跟蹤方法研究

發(fā)布時(shí)間:2018-11-20 12:22
【摘要】:微博作為目前最流行的社交應(yīng)用之一,成為人們信息獲取和傳播的主要途徑。微博數(shù)據(jù)實(shí)際上是一個(gè)高速、海量和動(dòng)態(tài)的信息流,更能表達(dá)出每時(shí)每刻的社會(huì)話題及其變化過(guò)程,從中進(jìn)行話題檢測(cè)及跟蹤對(duì)輿論監(jiān)督、民意調(diào)查有重大意思。在此背景下,本文提出了一個(gè)時(shí)效性高、能夠處理大規(guī)模數(shù)據(jù)流的聚類算法,并將其用到微博話題檢測(cè)與跟蹤上去,取得了較好的效果。提出了一種基于近鄰傳播的大規(guī)模數(shù)據(jù)流聚類處理方法(Affinity Propagation in Massive Data Stream,APMStream),主要包括初始聚類、在線聚類、聚類調(diào)整和聚類維護(hù)四個(gè)部分。從分布式迭代和動(dòng)態(tài)調(diào)整阻尼系數(shù)兩個(gè)方面改進(jìn)近鄰傳播(Affinity Propagation,AP)算法,使其適用于大規(guī)模數(shù)據(jù)的初始聚類。在線聚類能夠?qū)崟r(shí)在線處理每個(gè)元組,根據(jù)與已有聚類的距離將元組歸并到聚類中或者創(chuàng)建一個(gè)新的聚類。聚類調(diào)整首先重新選取聚類中心,然后運(yùn)用加權(quán)的AP算法對(duì)新的聚類中心進(jìn)行聚類。聚類維護(hù)通過(guò)刪除長(zhǎng)時(shí)間沒(méi)有更新的聚類和重要程度低的元組,維持系統(tǒng)的負(fù)載在合理的范圍內(nèi)。將APMStream方法用到話題檢測(cè)與跟蹤上去,主要包括微博重要程度的度量和微博之間距離的計(jì)算,其中微博重要程度是通過(guò)基于微博之間的關(guān)系計(jì)算得到的,作為AP算法的優(yōu)先權(quán)參數(shù),決定微博成為聚類中心的概率大小;微博之間的距離是通過(guò)基于公共詞塊方法計(jì)算得到的,用于構(gòu)造AP算法的相似度矩陣。APMStream方法被設(shè)計(jì)成為分布式流處理框架Apache Storm的一個(gè)拓?fù)?數(shù)據(jù)的處理分布在這個(gè)拓?fù)涞母鱾(gè)節(jié)點(diǎn)上。經(jīng)過(guò)實(shí)驗(yàn)驗(yàn)證,APMStream可以快速處理大規(guī)模微博數(shù)據(jù)流,檢測(cè)微博話題,并且反映微博話題隨時(shí)間的演化過(guò)程。
[Abstract]:Weibo, as one of the most popular social applications, has become the main way for people to obtain and disseminate information. Weibo data is in fact a high-speed, massive and dynamic information flow, which can express the social topic and its changing process at every moment, from which to conduct topic detection and track the supervision of public opinion, public opinion survey has great significance. Under this background, this paper proposes a clustering algorithm which can deal with large scale data streams, and applies it to Weibo topic detection and tracking, and achieves good results. A large scale data stream clustering method based on nearest neighbor propagation (Affinity Propagation in Massive Data Stream,APMStream) is proposed, which includes four parts: initial clustering, online clustering, clustering adjustment and clustering maintenance. The nearest neighbor propagation (Affinity Propagation,AP) algorithm is improved from two aspects of distributed iteration and dynamically adjusting damping coefficient to make it suitable for the initial clustering of large-scale data. Online clustering can process each tuple in real time and merge the tuple into the cluster or create a new cluster according to the distance from the existing clustering. Firstly, the clustering center is re-selected, and then the new clustering center is clustered by using the weighted AP algorithm. Cluster maintenance maintains the system load within a reasonable range by deleting clusters that have not been updated for a long time and tuples of low importance. The APMStream method is applied to topic detection and tracking, mainly including the measurement of Weibo's importance and the calculation of the distance between Weibo. As the priority parameter of AP algorithm, the probability of Weibo becoming the center of clustering is determined. The distance between Weibo is calculated based on the common lexical block method, which is used to construct the similarity matrix of the AP algorithm. The APMStream method is designed as a topology of the distributed flow processing framework (Apache Storm). Data processing is distributed across the nodes of the topology. The experimental results show that APMStream can deal with the large-scale Weibo data flow quickly, detect the topic of Weibo, and reflect the evolution of the topic with time.
【學(xué)位授予單位】:華中科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2016
【分類號(hào)】:TP391.1

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 孫莉;張振;李繼云;王梅;;基于微博文本和元數(shù)據(jù)的話題檢測(cè)[J];計(jì)算機(jī)應(yīng)用與軟件;2016年03期

2 曹文琴;黃玉軍;涂國(guó)平;;微博話題傳播的時(shí)間網(wǎng)絡(luò)影響力模型研究[J];圖書(shū)情報(bào)工作;2016年01期

3 黃賢英;陳紅陽(yáng);劉英濤;;短文本相似度研究及其在微博話題檢測(cè)中的應(yīng)用[J];計(jì)算機(jī)工程與設(shè)計(jì);2015年11期

4 劉季;陳秀宏;杭文龍;;面向大規(guī)模數(shù)據(jù)的快速多代表點(diǎn)仿射傳播算法[J];計(jì)算機(jī)科學(xué)與探索;2016年02期

5 陳羽中;方明月;郭文忠;;面向微博熱點(diǎn)話題發(fā)現(xiàn)的多標(biāo)簽傳播聚類方法研究[J];模式識(shí)別與人工智能;2015年01期

6 慶艷華;左小德;;考慮服務(wù)懲罰的配送中心選址的雙層規(guī)劃模型[J];華南理工大學(xué)學(xué)報(bào)(社會(huì)科學(xué)版);2014年03期

7 張建朋;陳福才;李邵梅;劉力雄;;基于仿射傳播的進(jìn)化數(shù)據(jù)流在線聚類算法[J];模式識(shí)別與人工智能;2014年05期

8 王金明;王遠(yuǎn)方;;基于Twitter Storm平臺(tái)并行挖掘最稠密子圖[J];計(jì)算機(jī)科學(xué);2014年01期

9 王勇;肖詩(shī)斌;郭嵡秀;呂學(xué)強(qiáng);;中文微博突發(fā)事件檢測(cè)研究[J];現(xiàn)代圖書(shū)情報(bào)技術(shù);2013年02期

10 童薇;陳威;孟小峰;;EDM:高效的微博事件檢測(cè)算法[J];計(jì)算機(jī)科學(xué)與探索;2012年12期

相關(guān)碩士學(xué)位論文 前1條

1 黃軍;社交網(wǎng)絡(luò)熱點(diǎn)話題公眾情感極性實(shí)時(shí)計(jì)算研究[D];杭州電子科技大學(xué);2015年



本文編號(hào):2344900

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2344900.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶4230c***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com