基于Storm框架的微博用戶潛在需求實(shí)時(shí)分析評估系統(tǒng)
[Abstract]:With the popularity and development of the Internet, Weibo, as an open information exchange and sharing platform, can generate hundreds of millions of levels of data every day. Mining the potential purchase behavior of users from these massive data and analyzing it will produce great economic value to the enterprise. However, the current research and analysis methods have the following shortcomings: the real-time analysis of Weibo is insufficient, resulting in a certain lag in the analysis results; at present, Weibo analysis is not targeted enough, and the value of specific groups has not been fully excavated. Aiming at the problems existing in the existing analysis methods for mining the potential purchase behavior of Weibo users, an efficient and real-time Weibo user behavior analysis and evaluation system based on Storm is designed and implemented in this paper. The specific work includes: firstly, the problem of uneven task distribution in the existing scheduling strategies of Storm is proposed and verified by experiments, and then an adaptive scheduling model based on CPU weights is proposed. In order to solve the problem of low efficiency caused by the time delay between internal nodes and the local characteristics of messages. Then it is the design and implementation of the real-time analysis system: it is divided into data source module, data access module, data analysis module and data display module: the data source obtains Weibo data through crawler and Sina API; The data access module solves the problem of data flow delay by building Kafka cluster, realizes the Spout and Bolt interface of Storm, realizes the data analysis module, uses Chinese word segmentation technology to segment the data, and uses K-means to analyze the data. The data storage module and Hbase are used to save the data, and SpringMVC and ECharts are used to realize the data display module. The experimental results show that the performance of the improved scheduling strategy is obviously better than that of the existing scheduling strategies, especially in CPU-intensive scheduling tasks, the performance of the improved scheduling strategy is obviously improved by about 50%. The real-time analysis system can analyze the potential purchase behavior of users in real time, and enterprises can carry out related research and marketing according to the analyzed behavior characteristics.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2016
【分類號(hào)】:TP393.092;TP311.13
【參考文獻(xiàn)】
相關(guān)期刊論文 前9條
1 趙林莉;楊曉光;;基于Hadoop的多最小支持度關(guān)聯(lián)規(guī)則挖掘研究[J];數(shù)字技術(shù)與應(yīng)用;2015年10期
2 燕明磊;;Hadoop集群中作業(yè)調(diào)度研究[J];軟件導(dǎo)刊;2015年04期
3 靳永超;吳懷谷;;基于Storm和Hadoop的大數(shù)據(jù)處理架構(gòu)的研究[J];現(xiàn)代計(jì)算機(jī)(專業(yè)版);2015年04期
4 李川;鄂海紅;宋美娜;;基于Storm的實(shí)時(shí)計(jì)算框架的研究與應(yīng)用[J];軟件;2014年10期
5 柴昱含;李道全;;基于Storm的滑動(dòng)窗口實(shí)現(xiàn)[J];電腦知識(shí)與技術(shù);2014年16期
6 黃靜;張琦;江文斌;;基于改進(jìn)K-Means算法的蠶繭自動(dòng)計(jì)數(shù)方法的研究[J];絲綢;2014年01期
7 杜政頡;王鵬;黃焱;郎福通;;一種基于Storm編程模型的迭代Topology方案[J];成都信息工程學(xué)院學(xué)報(bào);2014年01期
8 張榆;馬友忠;孟小峰;;一種基于HBase的高效空間關(guān)鍵字查詢策略[J];小型微型計(jì)算機(jī)系統(tǒng);2012年10期
9 林大云;;基于Hadoop的微博信息挖掘[J];計(jì)算機(jī)光盤軟件與應(yīng)用;2012年01期
相關(guān)博士學(xué)位論文 前1條
1 田野;基于微博平臺(tái)的事件趨勢分析及預(yù)測研究[D];武漢大學(xué);2012年
相關(guān)碩士學(xué)位論文 前9條
1 南海京;一種基于STORM的交通流數(shù)據(jù)實(shí)時(shí)處理系統(tǒng)設(shè)計(jì)與實(shí)現(xiàn)[D];北方工業(yè)大學(xué);2015年
2 馬瑞;基于Storm的短信詐騙攔截提示系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[D];北京郵電大學(xué);2014年
3 周茜;基于網(wǎng)絡(luò)爬蟲的信息采集分類系統(tǒng)設(shè)計(jì)與實(shí)現(xiàn)[D];廈門大學(xué);2013年
4 李浩;基于Twitter Storm的云平臺(tái)監(jiān)控系統(tǒng)研究與實(shí)現(xiàn)[D];東北大學(xué);2013年
5 史冬冬;云隊(duì)列:一個(gè)基于Hadoop的大規(guī)模消息基礎(chǔ)平臺(tái)[D];東華大學(xué);2012年
6 石安磊;基于文本相似度評分的中醫(yī)案例分析系統(tǒng)研究與實(shí)現(xiàn)[D];西北大學(xué);2011年
7 徐曉明;專利文本聚類及關(guān)鍵短語抽取的研究[D];東北大學(xué);2011年
8 董長春;基于Hadoop的倒排索引技術(shù)的研究[D];遼寧大學(xué);2011年
9 蘇旋;分布式網(wǎng)絡(luò)爬蟲技術(shù)的研究與實(shí)現(xiàn)[D];哈爾濱工業(yè)大學(xué);2006年
,本文編號(hào):2479467
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2479467.html