天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于Kafka的大規(guī)模流數(shù)據(jù)分布式緩存與分析平臺(tái)

發(fā)布時(shí)間:2018-08-09 10:45
【摘要】:近年來,隨著信息技術(shù)與互聯(lián)網(wǎng)應(yīng)用的不斷發(fā)展,全球數(shù)據(jù)總量也在呈現(xiàn)爆炸式的增長(zhǎng),大數(shù)據(jù)時(shí)代即將來臨。這將不僅為科學(xué)研究領(lǐng)域帶來巨大變革,也將深遠(yuǎn)地影響到未來我們生活的方方面面,F(xiàn)階段,在大數(shù)據(jù)分析與計(jì)算領(lǐng)域,由于分布式集群架構(gòu)低成本、高計(jì)算能力、良好的可擴(kuò)展性等特點(diǎn)獲得了越來越廣泛的應(yīng)用。于此同時(shí),分布式集群架構(gòu)中所計(jì)算和分析的數(shù)據(jù)結(jié)構(gòu)也越來越多樣化。近年電子商務(wù)、物聯(lián)網(wǎng),互聯(lián)網(wǎng)金融等領(lǐng)域的應(yīng)用不斷發(fā)展,在多數(shù)分布式集群中同時(shí)存在著監(jiān)控端傳輸?shù)膭?dòng)態(tài)流數(shù)據(jù)和系統(tǒng)生成的運(yùn)行時(shí)日志文件。這種情況下,由于不同特點(diǎn)的數(shù)據(jù)所適合分析算法和計(jì)算方式也有所不同,例如流數(shù)據(jù)處理過程關(guān)于實(shí)時(shí)性和拓?fù)浣Y(jié)構(gòu)多樣性的要求,大規(guī)模數(shù)據(jù)批量處理過程中的系統(tǒng)吞吐量和資源利用率的要求。而現(xiàn)有的主流分布式集群系統(tǒng)通常只適合對(duì)一種特定的數(shù)據(jù)進(jìn)行分析,例如Hadoop[19][21]、Storm[22]以及S4[23]等,而無法適應(yīng)多種類型數(shù)據(jù)結(jié)構(gòu)并存情況。本文創(chuàng)新性的提出了基于kafka的大規(guī)模流數(shù)據(jù)分布式緩存與分析平臺(tái)。該平臺(tái)的設(shè)計(jì)目標(biāo)是組織和緩存系統(tǒng)輸入的大規(guī)模流數(shù)據(jù)。并分別設(shè)計(jì)在線流數(shù)據(jù)處理和離線批處理多種方式的處理單元,依據(jù)不同數(shù)據(jù)類型選擇合適的方式進(jìn)行分析運(yùn)算?偨Y(jié)該緩存與分析平臺(tái)的特點(diǎn),主要分為以下幾個(gè)方面:(1)采用分布式消息系統(tǒng)作為大規(guī)模流數(shù)據(jù)的緩存,提高了平臺(tái)對(duì)動(dòng)態(tài)流數(shù)據(jù)輸入數(shù)據(jù)量突發(fā)性變化的適應(yīng)能力。(2)設(shè)計(jì)并實(shí)現(xiàn)在線實(shí)時(shí)處理單元及離線批處理單元,分別處理集群中不同特點(diǎn)的數(shù)據(jù),以滿足不同類型的數(shù)據(jù)對(duì)計(jì)算實(shí)時(shí)性和系統(tǒng)吞吐量不同方面的需求。(3)整個(gè)平臺(tái)采用集中式的管理方式,不同模塊、不同處理單元中的節(jié)點(diǎn)信息統(tǒng)一同步到管理模塊,以實(shí)現(xiàn)平臺(tái)節(jié)點(diǎn)信息的全局一致性。本文詳細(xì)介紹了平臺(tái)的總體架構(gòu)設(shè)計(jì),將系統(tǒng)分為三個(gè)部分,分別實(shí)現(xiàn)緩存訂閱、在線實(shí)時(shí)處理以及系統(tǒng)管理等功能。并基于此設(shè)計(jì)實(shí)現(xiàn)了基于kafka的大規(guī)模流數(shù)據(jù)分布式緩存與分析平臺(tái)模型。最后驗(yàn)證了平臺(tái)的可用性,可擴(kuò)展性及高效性等特點(diǎn)。本文希望能通過該平臺(tái)的設(shè)計(jì)與實(shí)現(xiàn)過程,給分布式計(jì)算集群的搭建和大規(guī)模流數(shù)據(jù)處理過程提供新的思路和方法。也希望能夠通過進(jìn)一步的努力,不斷完善平臺(tái)模型,將該平臺(tái)用于實(shí)際生活、生產(chǎn)、以及研究過程中。
[Abstract]:In recent years, with the continuous development of information technology and Internet applications, the global data volume is also explosive growth, the big data era is coming. This will not only bring about great changes in the field of scientific research, but will also have a profound impact on all aspects of our lives in the future. At present, in the field of big data analysis and computing, the distributed cluster architecture has been applied more and more widely because of its low cost, high computing power and good scalability. At the same time, the data structures calculated and analyzed in the distributed cluster architecture are more and more diversified. In recent years, the applications of electronic commerce, Internet of things, Internet of Finance and so on have been continuously developed. In most distributed clusters, there are dynamic stream data transmitted by monitoring terminal and runtime log files generated by the system at the same time. In this case, due to different characteristics of the data suitable for analysis algorithms and calculation methods are different, such as the flow of data processing process for real-time and topology diversity requirements, Requirements of system throughput and resource utilization during mass processing of large-scale data. However, the existing mainstream distributed cluster systems are generally suitable for the analysis of a specific data, such as Hadoop [19] [21] Storm [22] and S4 [23], but can not adapt to the coexistence of many types of data structures. In this paper, a kafka-based distributed cache and analysis platform for large-scale stream data is proposed. The platform is designed to organize and cache large-scale stream data input from the system. The processing units of on-line stream data processing and off-line batch processing are designed, and the analysis and operation are carried out according to different data types. The characteristics of the cache and analysis platform are summarized, which are divided into the following aspects: (1) the distributed message system is used as the cache of large-scale stream data. It improves the adaptability of the platform to the sudden change of the data input data from the dynamic flow. (2) the on-line real-time processing unit and the off-line batch processing unit are designed and implemented to process the data with different characteristics in the cluster, respectively. In order to meet the requirements of different types of data for real-time computing and system throughput. (3) the whole platform adopts centralized management mode, different modules, different processing unit node information synchronization to the management module, In order to realize the global consistency of the platform node information. This paper introduces the overall architecture of the platform in detail. The system is divided into three parts: cache subscription, online real-time processing and system management. Based on this design, the distributed cache and analysis platform model of large scale stream data based on kafka is implemented. Finally, the usability, extensibility and efficiency of the platform are verified. Through the design and implementation of the platform, this paper hopes to provide new ideas and methods for the construction of distributed computing clusters and large-scale data processing. It is also hoped that through further efforts, the platform model can be improved and used in real life, production, and research process.
【學(xué)位授予單位】:吉林大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2016
【分類號(hào)】:TP311.13

【相似文獻(xiàn)】

相關(guān)期刊論文 前10條

1 金澈清,錢衛(wèi)寧,周傲英;流數(shù)據(jù)分析與管理綜述[J];軟件學(xué)報(bào);2004年08期

2 聶國(guó)梁;盧正鼎;;流數(shù)據(jù)實(shí)時(shí)近似求和的算法研究[J];小型微型計(jì)算機(jī)系統(tǒng);2005年10期

3 李衛(wèi)民;于守健;駱軼姝;樂嘉錦;;流數(shù)據(jù)管理的降載技術(shù):研究進(jìn)展[J];計(jì)算機(jī)科學(xué);2007年06期

4 李子杰;鄭誠;;流數(shù)據(jù)和傳統(tǒng)數(shù)據(jù)存儲(chǔ)及管理方法比較研究[J];計(jì)算機(jī)技術(shù)與發(fā)展;2009年04期

5 潘靜;于宏偉;;流數(shù)據(jù)管理降載技術(shù)研究綜述[J];中國(guó)管理信息化;2009年21期

6 鄒永貴;龔海平;夏英;宋強(qiáng);;一種面向流數(shù)據(jù)頻繁項(xiàng)挖掘的降載策略[J];計(jì)算機(jī)應(yīng)用研究;2011年04期

7 聶國(guó)梁;盧正鼎;聶國(guó)棟;;流數(shù)據(jù)近似統(tǒng)計(jì)算法研究[J];計(jì)算機(jī)科學(xué);2005年04期

8 魏晶晶;金培權(quán);龔育昌;岳麗華;;基于流數(shù)據(jù)的大對(duì)象數(shù)據(jù)緩沖機(jī)制[J];計(jì)算機(jī)工程;2006年11期

9 楊立;;基于權(quán)重的流數(shù)據(jù)頻繁項(xiàng)挖掘算法的應(yīng)用[J];微型機(jī)與應(yīng)用;2011年02期

10 尹為;張成虎;楊彬;;基于流數(shù)據(jù)頻繁項(xiàng)挖掘的可疑金融交易識(shí)別研究[J];西安交通大學(xué)學(xué)報(bào)(社會(huì)科學(xué)版);2011年05期

相關(guān)會(huì)議論文 前3條

1 劉正濤;毛宇光;吳莊;;一種新的流數(shù)據(jù)模型及其擴(kuò)展[A];第二十二屆中國(guó)數(shù)據(jù)庫學(xué)術(shù)會(huì)議論文集(技術(shù)報(bào)告篇)[C];2005年

2 姚春芬;陳紅;;分布偏斜的流數(shù)據(jù)上的一種直方圖維護(hù)算法[A];第二十三屆中國(guó)數(shù)據(jù)庫學(xué)術(shù)會(huì)議論文集(技術(shù)報(bào)告篇)[C];2006年

3 孫煥良;趙法信;鮑玉斌;于戈;王大玲;;CD-Stream——一種基于空間劃分的流數(shù)據(jù)密度聚類算法[A];第二十一屆中國(guó)數(shù)據(jù)庫學(xué)術(shù)會(huì)議論文集(研究報(bào)告篇)[C];2004年

相關(guān)博士學(xué)位論文 前6條

1 丁智國(guó);流數(shù)據(jù)在線異常檢測(cè)方法研究[D];上海大學(xué);2015年

2 聶國(guó)梁;流數(shù)據(jù)統(tǒng)計(jì)算法研究[D];華中科技大學(xué);2006年

3 劉建偉;流數(shù)據(jù)查詢系統(tǒng)結(jié)構(gòu)及模式查詢算法的研究[D];東華大學(xué);2005年

4 李衛(wèi)民;流數(shù)據(jù)查詢算法若干關(guān)鍵技術(shù)研究[D];東華大學(xué);2008年

5 屠莉;流數(shù)據(jù)的頻繁項(xiàng)挖掘及聚類的關(guān)鍵技術(shù)研究[D];南京航空航天大學(xué);2009年

6 陳筠翰;車載網(wǎng)絡(luò)的若干關(guān)鍵技術(shù)研究[D];吉林大學(xué);2014年

相關(guān)碩士學(xué)位論文 前10條

1 孔祥佳;基于海洋平臺(tái)監(jiān)測(cè)的流數(shù)據(jù)管理研究[D];大連理工大學(xué);2015年

2 王晨陽;支持位置謂詞的XML流數(shù)據(jù)查詢技術(shù)[D];北京工業(yè)大學(xué);2015年

3 王中義;基于動(dòng)態(tài)支持度的流數(shù)據(jù)關(guān)聯(lián)規(guī)則挖掘[D];哈爾濱工業(yè)大學(xué);2014年

4 趙丹;面向流數(shù)據(jù)的不平衡樣本分類研究[D];哈爾濱工業(yè)大學(xué);2014年

5 馮學(xué)智;基于宏森林自動(dòng)機(jī)的XML流數(shù)據(jù)查詢技術(shù)[D];北京工業(yè)大學(xué);2015年

6 徐靂靂;物流數(shù)據(jù)中的云聚類調(diào)度算法研究[D];南京郵電大學(xué);2015年

7 肖丙賢;大規(guī)模流數(shù)據(jù)聚集查詢服務(wù)的生成與優(yōu)化[D];北方工業(yè)大學(xué);2016年

8 劉曉斐;分布式流處理系統(tǒng)操作共享優(yōu)化算法研究[D];吉林大學(xué);2016年

9 張媛;基于彈性分布式數(shù)據(jù)集的流數(shù)據(jù)聚類分析[D];華東師范大學(xué);2016年

10 王曾亦;基于內(nèi)存計(jì)算的流數(shù)據(jù)處理在飛行大數(shù)據(jù)的研究與應(yīng)用[D];電子科技大學(xué);2016年



本文編號(hào):2173828

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/jingjilunwen/dianzishangwulunwen/2173828.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶b49e5***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com