基于改進(jìn)H-K聚類算法的熱點(diǎn)話題發(fā)現(xiàn)方法
[Abstract]:With the rapid development of social network, Weibo visitors have become one of the platforms for communication and information dissemination in people's daily life. In a very short period of time, the Weibo platform can produce massive, scattered data sets. It is very difficult for Weibo guest users to distinguish hot topics from these massive text messages, so how to quickly, Accurately mining hot topics from massive data sets of Weibo guest texts has become a hot topic in current research. Because the traditional topic discovery method is usually based on feature word matching and does not consider the potential semantics of Weibo text, the quality of topic discovery is not high. In view of the characteristics of Weibo guest, this paper makes a deep research on Weibo hot topic discovery technology from the semantic point of view, and proposes a topic discovery method based on the improved Hxk clustering algorithm. In this paper, based on the characteristics of time scale and topic persistence of Weibo guest data sets, we improve the clustering algorithm used in the hot topic discovery method. In order to improve the processing efficiency of clustering, the algorithm is implemented in parallel with the idea of MapReduce programming in Hadoop, aiming at the massive Weibo guest data set, and in the method of Weibo guest topic discovery, the algorithm is implemented in parallel. Secondly, this paper analyzes the Weibo guest text from the semantic level, and transforms the unstructured Weibo guest text into text-theme distribution and theme-text feature word distribution by introducing the Weibo theme model, in order to reduce the dimension of Weibo guest text. From the semantic point of view, the Weibo guest is modeled to improve the accuracy of the similarity calculation of Weibo guest text. At the same time, in the modeling phase of Weibo text, the LDA topic model is parallelized with the idea of LDA parallel programming, so as to improve the processing ability of Weibo guest data set. The experiment results show that the improved clustering algorithm can improve the efficiency of clustering and improve the efficiency of time. Moreover, it can be applied to clustering of Weibo guest text better, which solves the problem of low efficiency of traditional clustering algorithm. With the introduction of cloud computing platform, the processing ability of massive Weibo guest text data set is improved. The hot topic discovery method proposed in this paper can find hot topics from the Weibo guest data set accurately and quickly according to the latent semantics of the feature words in Weibo guest text.
【學(xué)位授予單位】:哈爾濱工程大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TP391.1;TP393.092
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 范宇;符紅光;文奕;;基于LDA模型的專利信息聚類技術(shù)[J];計算機(jī)應(yīng)用;2013年S1期
2 廖彬;于炯;張?zhí)?楊興耀;;基于分布式文件系統(tǒng)HDFS的節(jié)能算法[J];計算機(jī)學(xué)報;2013年05期
3 薛素芝;魯燃;任圓圓;;基于速度增長的微博熱點(diǎn)話題發(fā)現(xiàn)[J];計算機(jī)應(yīng)用研究;2013年09期
4 李玉林;董晶;;基于Hadoop的MapReduce模型的研究與改進(jìn)[J];計算機(jī)工程與設(shè)計;2012年08期
5 鄭斐然;苗奪謙;張志飛;高燦;;一種中文微博新聞話題檢測的方法[J];計算機(jī)科學(xué);2012年01期
6 楊亮;林原;林鴻飛;;基于情感分布的微博熱點(diǎn)事件發(fā)現(xiàn)[J];中文信息學(xué)報;2012年01期
7 程苗;陳華平;;基于Hadoop的Web日志挖掘[J];計算機(jī)工程;2011年11期
8 趙應(yīng)秋;羅軍;張君艷;;基于知網(wǎng)的詞語語義相關(guān)度計算[J];信息技術(shù);2010年03期
9 魯明羽;姚曉娜;魏善嶺;;基于模糊聚類的網(wǎng)絡(luò)論壇熱點(diǎn)話題挖掘[J];大連海事大學(xué)學(xué)報;2008年04期
10 肖波;徐前方;藺志青;郭軍;李春光;;可信關(guān)聯(lián)規(guī)則及其基于極大團(tuán)的挖掘算法[J];軟件學(xué)報;2008年10期
相關(guān)碩士學(xué)位論文 前1條
1 張玨;網(wǎng)絡(luò)輿情預(yù)測模型與平臺的研究[D];北京交通大學(xué);2009年
,本文編號:2454016
本文鏈接:http://sikaile.net/guanlilunwen/ydhl/2454016.html