基于Kafka的大規(guī)模流數(shù)據(jù)分布式緩存與分析平臺(tái)
[Abstract]:In recent years, with the continuous development of information technology and Internet applications, the global data volume is also explosive growth, the big data era is coming. This will not only bring about great changes in the field of scientific research, but will also have a profound impact on all aspects of our lives in the future. At present, in the field of big data analysis and computing, the distributed cluster architecture has been applied more and more widely because of its low cost, high computing power and good scalability. At the same time, the data structures calculated and analyzed in the distributed cluster architecture are more and more diversified. In recent years, the applications of electronic commerce, Internet of things, Internet of Finance and so on have been continuously developed. In most distributed clusters, there are dynamic stream data transmitted by monitoring terminal and runtime log files generated by the system at the same time. In this case, due to different characteristics of the data suitable for analysis algorithms and calculation methods are different, such as the flow of data processing process for real-time and topology diversity requirements, Requirements of system throughput and resource utilization during mass processing of large-scale data. However, the existing mainstream distributed cluster systems are generally suitable for the analysis of a specific data, such as Hadoop [19] [21] Storm [22] and S4 [23], but can not adapt to the coexistence of many types of data structures. In this paper, a kafka-based distributed cache and analysis platform for large-scale stream data is proposed. The platform is designed to organize and cache large-scale stream data input from the system. The processing units of on-line stream data processing and off-line batch processing are designed, and the analysis and operation are carried out according to different data types. The characteristics of the cache and analysis platform are summarized, which are divided into the following aspects: (1) the distributed message system is used as the cache of large-scale stream data. It improves the adaptability of the platform to the sudden change of the data input data from the dynamic flow. (2) the on-line real-time processing unit and the off-line batch processing unit are designed and implemented to process the data with different characteristics in the cluster, respectively. In order to meet the requirements of different types of data for real-time computing and system throughput. (3) the whole platform adopts centralized management mode, different modules, different processing unit node information synchronization to the management module, In order to realize the global consistency of the platform node information. This paper introduces the overall architecture of the platform in detail. The system is divided into three parts: cache subscription, online real-time processing and system management. Based on this design, the distributed cache and analysis platform model of large scale stream data based on kafka is implemented. Finally, the usability, extensibility and efficiency of the platform are verified. Through the design and implementation of the platform, this paper hopes to provide new ideas and methods for the construction of distributed computing clusters and large-scale data processing. It is also hoped that through further efforts, the platform model can be improved and used in real life, production, and research process.
【學(xué)位授予單位】:吉林大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2016
【分類號(hào)】:TP311.13
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 金澈清,錢衛(wèi)寧,周傲英;流數(shù)據(jù)分析與管理綜述[J];軟件學(xué)報(bào);2004年08期
2 聶國(guó)梁;盧正鼎;;流數(shù)據(jù)實(shí)時(shí)近似求和的算法研究[J];小型微型計(jì)算機(jī)系統(tǒng);2005年10期
3 李衛(wèi)民;于守健;駱軼姝;樂嘉錦;;流數(shù)據(jù)管理的降載技術(shù):研究進(jìn)展[J];計(jì)算機(jī)科學(xué);2007年06期
4 李子杰;鄭誠;;流數(shù)據(jù)和傳統(tǒng)數(shù)據(jù)存儲(chǔ)及管理方法比較研究[J];計(jì)算機(jī)技術(shù)與發(fā)展;2009年04期
5 潘靜;于宏偉;;流數(shù)據(jù)管理降載技術(shù)研究綜述[J];中國(guó)管理信息化;2009年21期
6 鄒永貴;龔海平;夏英;宋強(qiáng);;一種面向流數(shù)據(jù)頻繁項(xiàng)挖掘的降載策略[J];計(jì)算機(jī)應(yīng)用研究;2011年04期
7 聶國(guó)梁;盧正鼎;聶國(guó)棟;;流數(shù)據(jù)近似統(tǒng)計(jì)算法研究[J];計(jì)算機(jī)科學(xué);2005年04期
8 魏晶晶;金培權(quán);龔育昌;岳麗華;;基于流數(shù)據(jù)的大對(duì)象數(shù)據(jù)緩沖機(jī)制[J];計(jì)算機(jī)工程;2006年11期
9 楊立;;基于權(quán)重的流數(shù)據(jù)頻繁項(xiàng)挖掘算法的應(yīng)用[J];微型機(jī)與應(yīng)用;2011年02期
10 尹為;張成虎;楊彬;;基于流數(shù)據(jù)頻繁項(xiàng)挖掘的可疑金融交易識(shí)別研究[J];西安交通大學(xué)學(xué)報(bào)(社會(huì)科學(xué)版);2011年05期
相關(guān)會(huì)議論文 前3條
1 劉正濤;毛宇光;吳莊;;一種新的流數(shù)據(jù)模型及其擴(kuò)展[A];第二十二屆中國(guó)數(shù)據(jù)庫學(xué)術(shù)會(huì)議論文集(技術(shù)報(bào)告篇)[C];2005年
2 姚春芬;陳紅;;分布偏斜的流數(shù)據(jù)上的一種直方圖維護(hù)算法[A];第二十三屆中國(guó)數(shù)據(jù)庫學(xué)術(shù)會(huì)議論文集(技術(shù)報(bào)告篇)[C];2006年
3 孫煥良;趙法信;鮑玉斌;于戈;王大玲;;CD-Stream——一種基于空間劃分的流數(shù)據(jù)密度聚類算法[A];第二十一屆中國(guó)數(shù)據(jù)庫學(xué)術(shù)會(huì)議論文集(研究報(bào)告篇)[C];2004年
相關(guān)博士學(xué)位論文 前6條
1 丁智國(guó);流數(shù)據(jù)在線異常檢測(cè)方法研究[D];上海大學(xué);2015年
2 聶國(guó)梁;流數(shù)據(jù)統(tǒng)計(jì)算法研究[D];華中科技大學(xué);2006年
3 劉建偉;流數(shù)據(jù)查詢系統(tǒng)結(jié)構(gòu)及模式查詢算法的研究[D];東華大學(xué);2005年
4 李衛(wèi)民;流數(shù)據(jù)查詢算法若干關(guān)鍵技術(shù)研究[D];東華大學(xué);2008年
5 屠莉;流數(shù)據(jù)的頻繁項(xiàng)挖掘及聚類的關(guān)鍵技術(shù)研究[D];南京航空航天大學(xué);2009年
6 陳筠翰;車載網(wǎng)絡(luò)的若干關(guān)鍵技術(shù)研究[D];吉林大學(xué);2014年
相關(guān)碩士學(xué)位論文 前10條
1 孔祥佳;基于海洋平臺(tái)監(jiān)測(cè)的流數(shù)據(jù)管理研究[D];大連理工大學(xué);2015年
2 王晨陽;支持位置謂詞的XML流數(shù)據(jù)查詢技術(shù)[D];北京工業(yè)大學(xué);2015年
3 王中義;基于動(dòng)態(tài)支持度的流數(shù)據(jù)關(guān)聯(lián)規(guī)則挖掘[D];哈爾濱工業(yè)大學(xué);2014年
4 趙丹;面向流數(shù)據(jù)的不平衡樣本分類研究[D];哈爾濱工業(yè)大學(xué);2014年
5 馮學(xué)智;基于宏森林自動(dòng)機(jī)的XML流數(shù)據(jù)查詢技術(shù)[D];北京工業(yè)大學(xué);2015年
6 徐靂靂;物流數(shù)據(jù)中的云聚類調(diào)度算法研究[D];南京郵電大學(xué);2015年
7 肖丙賢;大規(guī)模流數(shù)據(jù)聚集查詢服務(wù)的生成與優(yōu)化[D];北方工業(yè)大學(xué);2016年
8 劉曉斐;分布式流處理系統(tǒng)操作共享優(yōu)化算法研究[D];吉林大學(xué);2016年
9 張媛;基于彈性分布式數(shù)據(jù)集的流數(shù)據(jù)聚類分析[D];華東師范大學(xué);2016年
10 王曾亦;基于內(nèi)存計(jì)算的流數(shù)據(jù)處理在飛行大數(shù)據(jù)的研究與應(yīng)用[D];電子科技大學(xué);2016年
,本文編號(hào):2173828
本文鏈接:http://sikaile.net/jingjilunwen/dianzishangwulunwen/2173828.html