基于Storm框架的微博用戶潛在需求實時分析評估系統(tǒng)

發(fā)布時間：2019-05-17 22:53

【摘要】：隨著互聯(lián)網(wǎng)的普及與發(fā)展,微博作為一個開放的信息交流和分享平臺,每天可以產(chǎn)生上億級別的數(shù)據(jù)。從這些海量數(shù)據(jù)中挖掘出用戶潛在的購買行為并加以分析會對企業(yè)產(chǎn)生巨大的經(jīng)濟價值。但目前的研究分析方法存在以下缺點:對微博分析的實時性不足,導(dǎo)致分析結(jié)果存在一定的滯后;當(dāng)前微博分析的針對性不足,沒有充分挖掘出特定群體的價值。本文針對現(xiàn)有的挖掘微博用戶潛在購買行為的分析方法所存在的問題,設(shè)計并實現(xiàn)了一個高效的、實時的基于Storm的微博用戶行為分析評估系統(tǒng)。具體工作包括:先提出Storm現(xiàn)有的調(diào)度策略所存在的任務(wù)分布不均的問題并通過實驗加以驗證,接著提出基于CPU權(quán)值的自適應(yīng)調(diào)度模型,以解決由內(nèi)部節(jié)點間的時間延時和消息的本地特性造成效率較低的問題;其后為實時分析系統(tǒng)的設(shè)計及實現(xiàn):分為數(shù)據(jù)來源模塊、數(shù)據(jù)接入模塊、數(shù)據(jù)分析模塊和數(shù)據(jù)展示模塊:數(shù)據(jù)來源通過爬蟲和新浪API獲取微博的數(shù)據(jù);數(shù)據(jù)接入模塊通過搭建Kafka集群,解決數(shù)據(jù)流延時問題;實現(xiàn)Storm的Spout和Bolt接口,實現(xiàn)數(shù)據(jù)分析模塊,利用中文分詞技術(shù)對數(shù)據(jù)進行分詞,用K-means對數(shù)據(jù)進行聚會分析;利用數(shù)據(jù)存儲模塊和Hbase對數(shù)據(jù)進行持久化保存、利用SpringMVC和ECharts完成數(shù)據(jù)展示模塊的實現(xiàn)。實驗表明,改進后的調(diào)度策略性能明顯優(yōu)于現(xiàn)有的調(diào)度策略,尤其是在CPU密集型的調(diào)度任務(wù)方面,性能明顯了提高50%左右。該實時分析系統(tǒng)可以實時分析出用戶的潛在購買行為,企業(yè)可以根據(jù)分析出的行為特征,進而進行相關(guān)的研究和營銷。
[Abstract]:With the popularity and development of the Internet, Weibo, as an open information exchange and sharing platform, can generate hundreds of millions of levels of data every day. Mining the potential purchase behavior of users from these massive data and analyzing it will produce great economic value to the enterprise. However, the current research and analysis methods have the following shortcomings: the real-time analysis of Weibo is insufficient, resulting in a certain lag in the analysis results; at present, Weibo analysis is not targeted enough, and the value of specific groups has not been fully excavated. Aiming at the problems existing in the existing analysis methods for mining the potential purchase behavior of Weibo users, an efficient and real-time Weibo user behavior analysis and evaluation system based on Storm is designed and implemented in this paper. The specific work includes: firstly, the problem of uneven task distribution in the existing scheduling strategies of Storm is proposed and verified by experiments, and then an adaptive scheduling model based on CPU weights is proposed. In order to solve the problem of low efficiency caused by the time delay between internal nodes and the local characteristics of messages. Then it is the design and implementation of the real-time analysis system: it is divided into data source module, data access module, data analysis module and data display module: the data source obtains Weibo data through crawler and Sina API; The data access module solves the problem of data flow delay by building Kafka cluster, realizes the Spout and Bolt interface of Storm, realizes the data analysis module, uses Chinese word segmentation technology to segment the data, and uses K-means to analyze the data. The data storage module and Hbase are used to save the data, and SpringMVC and ECharts are used to realize the data display module. The experimental results show that the performance of the improved scheduling strategy is obviously better than that of the existing scheduling strategies, especially in CPU-intensive scheduling tasks, the performance of the improved scheduling strategy is obviously improved by about 50%. The real-time analysis system can analyze the potential purchase behavior of users in real time, and enterprises can carry out related research and marketing according to the analyzed behavior characteristics.
【學(xué)位授予單位】：北京郵電大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2016
【分類號】：TP393.092;TP311.13

【參考文獻】

相關(guān)期刊論文前9條

1 趙林莉;楊曉光;;基于Hadoop的多最小支持度關(guān)聯(lián)規(guī)則挖掘研究[J];數(shù)字技術(shù)與應(yīng)用;2015年10期

2 燕明磊;;Hadoop集群中作業(yè)調(diào)度研究[J];軟件導(dǎo)刊;2015年04期

3 靳永超;吳懷谷;;基于Storm和Hadoop的大數(shù)據(jù)處理架構(gòu)的研究[J];現(xiàn)代計算機(專業(yè)版);2015年04期

4 李川;鄂海紅;宋美娜;;基于Storm的實時計算框架的研究與應(yīng)用[J];軟件;2014年10期

5 柴昱含;李道全;;基于Storm的滑動窗口實現(xiàn)[J];電腦知識與技術(shù);2014年16期

6 黃靜;張琦;江文斌;;基于改進K-Means算法的蠶繭自動計數(shù)方法的研究[J];絲綢;2014年01期

7 杜政頡;王鵬;黃焱;郎福通;;一種基于Storm編程模型的迭代Topology方案[J];成都信息工程學(xué)院學(xué)報;2014年01期

8 張榆;馬友忠;孟小峰;;一種基于HBase的高效空間關(guān)鍵字查詢策略[J];小型微型計算機系統(tǒng);2012年10期

9 林大云;;基于Hadoop的微博信息挖掘[J];計算機光盤軟件與應(yīng)用;2012年01期

相關(guān)博士學(xué)位論文前1條

1 田野;基于微博平臺的事件趨勢分析及預(yù)測研究[D];武漢大學(xué);2012年

相關(guān)碩士學(xué)位論文前9條

1 南海京;一種基于STORM的交通流數(shù)據(jù)實時處理系統(tǒng)設(shè)計與實現(xiàn)[D];北方工業(yè)大學(xué);2015年

2 馬瑞;基于Storm的短信詐騙攔截提示系統(tǒng)的設(shè)計與實現(xiàn)[D];北京郵電大學(xué);2014年

3 周茜;基于網(wǎng)絡(luò)爬蟲的信息采集分類系統(tǒng)設(shè)計與實現(xiàn)[D];廈門大學(xué);2013年

4 李浩;基于Twitter Storm的云平臺監(jiān)控系統(tǒng)研究與實現(xiàn)[D];東北大學(xué);2013年

5 史冬冬;云隊列：一個基于Hadoop的大規(guī)模消息基礎(chǔ)平臺[D];東華大學(xué);2012年

6 石安磊;基于文本相似度評分的中醫(yī)案例分析系統(tǒng)研究與實現(xiàn)[D];西北大學(xué);2011年

7 徐曉明;專利文本聚類及關(guān)鍵短語抽取的研究[D];東北大學(xué);2011年

8 董長春;基于Hadoop的倒排索引技術(shù)的研究[D];遼寧大學(xué);2011年

9 蘇旋;分布式網(wǎng)絡(luò)爬蟲技術(shù)的研究與實現(xiàn)[D];哈爾濱工業(yè)大學(xué);2006年

，

本文編號：2479467

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2479467.html

上一篇：Hadoop平臺下海量圖像處理實現(xiàn)
下一篇：基于即時編譯的動態(tài)污點跟蹤優(yōu)化

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于Storm框架的微博用戶潛在需求實時分析評估系統(tǒng)