天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當前位置:主頁 > 科技論文 > 軟件論文 >

面向工業(yè)大數(shù)據(jù)的分布式ETL系統(tǒng)的設(shè)計與實現(xiàn)

發(fā)布時間:2018-08-26 10:59
【摘要】:自從進入工業(yè)4.0時代以來,由于互聯(lián)網(wǎng)和計算機技術(shù)的高速發(fā)展,在與工業(yè)系統(tǒng)深度融合過程中引發(fā)的生產(chǎn)力、生產(chǎn)關(guān)系、生產(chǎn)技術(shù)、商業(yè)模式以及創(chuàng)新模式等方面的深度變革,使整個工業(yè)系統(tǒng)邁向全面智能化的革命性轉(zhuǎn)變。工業(yè)大數(shù)據(jù)分析是未來工業(yè)在全球市場中發(fā)揮競爭優(yōu)勢的關(guān)鍵領(lǐng)域。隨著物聯(lián)網(wǎng)和信息物理系統(tǒng)時代的來臨,更多數(shù)據(jù)可以被收集和分析,并用于做出更明智的決策。在整個工業(yè)大數(shù)據(jù)分析的過程中,歷史數(shù)據(jù)如何從各個數(shù)據(jù)源匯聚到分析系統(tǒng)中、實時數(shù)據(jù)如何從各個傳感器加載到分析系統(tǒng)中成為整個數(shù)據(jù)分析的基礎(chǔ)。這就要用到數(shù)據(jù)處理工具ETL(Extract-Transform-Load,抽取、轉(zhuǎn)換、加載)。傳統(tǒng)的ETL多是在單機系統(tǒng)下并行運行,其處理速度和處理量遠遠不能滿足工業(yè)數(shù)據(jù)分析的要求。而商業(yè)ETL性能好,但是價格昂貴,而且對硬件系統(tǒng)的要求太高,無法做到普及。針對以上情況,本文針對工業(yè)數(shù)據(jù)處理設(shè)計并實現(xiàn)了一種價格低廉、性能高的分布式ETL系統(tǒng)。本文分布式ETL系統(tǒng)的設(shè)計主要分三個模塊展開:數(shù)據(jù)抽取模塊、數(shù)據(jù)轉(zhuǎn)換模塊以及數(shù)據(jù)加載模塊。數(shù)據(jù)抽取階段主要設(shè)計了基于分表觸發(fā)器的變更數(shù)據(jù)捕獲方案、基于數(shù)據(jù)校驗的差異數(shù)據(jù)同步方案和基于Redis的Pub/Sub通信模式的實時數(shù)據(jù)抽取方案。數(shù)據(jù)轉(zhuǎn)換階段主要根據(jù)數(shù)據(jù)對處理速度和處理量的要求分別設(shè)計了批處理層和加速層,批處理層主要處理對實時性要求不高的歷史數(shù)據(jù),基于Hadoop的MapReduce實現(xiàn);加速層主要處理的實時數(shù)據(jù),基于Spark Streaming流處理方式實現(xiàn)。數(shù)據(jù)加載階段主要由Sqoop來處理結(jié)構(gòu)化數(shù)據(jù)的加載、由HDFS客戶端來處理非結(jié)構(gòu)化數(shù)據(jù)的加載。最后本文對設(shè)計的分布式ETL系統(tǒng)分別進行了功能測試和性能測試。試驗結(jié)果表明,本文設(shè)計的ETL系統(tǒng)在處理工業(yè)大數(shù)據(jù)的問題上具有較好的性能,這對工業(yè)數(shù)據(jù)的信息化改造具有較強的實際意義。
[Abstract]:Because of the rapid development of the Internet and computer technology, the productivity, relations of production, and production technology caused by the deep integration with the industrial system have been increased since the beginning of the 4.0 era of industry. The deep transformation of business model and innovation mode makes the whole industrial system move toward the revolutionary transformation of full intelligence. Industry big data analysis is the future industry in the global market play a key area of competitive advantage. With the advent of the Internet of things and the age of information physics systems, more data can be collected, analyzed, and used to make more informed decisions. In the whole process of big data's analysis, how the historical data converge from the various data sources to the analysis system, and how the real-time data is loaded into the analysis system from each sensor becomes the basis of the whole data analysis. This will use the data processing tool ETL (Extract-Transform-Load, extraction, transformation, loading). The traditional ETL is mostly run in parallel in a single computer system, and its processing speed and processing capacity are far from meeting the requirements of industrial data analysis. The commercial ETL performance is good, but the price is expensive, and the request to the hardware system is too high, cannot achieve the popularization. In view of the above situation, this paper designs and implements a low price and high performance distributed ETL system for industrial data processing. The design of distributed ETL system is divided into three modules: data extraction module, data conversion module and data loading module. In the stage of data extraction, we mainly design change data capture scheme based on table trigger, differential data synchronization scheme based on data verification and real-time data extraction scheme based on Pub/Sub communication mode based on Redis. In the data conversion stage, the batch layer and the acceleration layer are designed according to the requirements of the data processing speed and the processing capacity, respectively. The batch layer mainly processes the historical data with low real-time requirements, and the MapReduce based on Hadoop is implemented. The real-time data processing in acceleration layer is based on Spark Streaming stream processing. In the data loading stage, the loading of structured data is mainly handled by Sqoop, and the loading of unstructured data is handled by HDFS client. Finally, the function and performance of the distributed ETL system are tested. The experimental results show that the ETL system designed in this paper has better performance in dealing with the problem of industrial big data, which has a strong practical significance for the information transformation of industrial data.
【學位授予單位】:中國科學院大學(中國科學院沈陽計算技術(shù)研究所)
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TP311.13

【參考文獻】

相關(guān)期刊論文 前8條

1 文亞;;德國公共風險管理的經(jīng)驗與啟示[J];中國行政管理;2015年04期

2 鄭軍;尹兆濤;;中國石油應(yīng)對“大數(shù)據(jù)”的策略分析[J];石油規(guī)劃設(shè)計;2013年06期

3 宋杰;郝文寧;陳剛;靳大尉;趙水寧;;基于MapReduce的分布式ETL體系結(jié)構(gòu)研究[J];計算機科學;2013年06期

4 段成;王增平;吳克河;;一種輕量級電網(wǎng)實時數(shù)據(jù)ETL系統(tǒng)的設(shè)計與實現(xiàn)[J];電力系統(tǒng)保護與控制;2010年18期

5 戴浩;楊波;;ETL中的數(shù)據(jù)增量抽取機制研究[J];計算機工程與設(shè)計;2009年23期

6 馬瑞新;許力;;基于SOA的實時ETL的研究與實現(xiàn)[J];計算機工程與科學;2007年08期

7 祁利剛;候小靜;;基于數(shù)據(jù)倉庫的ETL技術(shù)研究[J];中國電力教育;2006年S1期

8 章水鑫,徐宏炳,于立;增量式ETL工具的研究與實現(xiàn)[J];現(xiàn)代計算機(專業(yè)版);2005年03期

相關(guān)碩士學位論文 前10條

1 林建昌;電力行業(yè)分布式ETL數(shù)據(jù)集成系統(tǒng)研究與實現(xiàn)[D];電子科技大學;2015年

2 陳洪江;MapReduce下容錯機制的研究與優(yōu)化[D];哈爾濱工業(yè)大學;2014年

3 趙賽;云存儲中基于動態(tài)多中心的分布式文件系統(tǒng)研究[D];燕山大學;2014年

4 李W,

本文編號:2204665


資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2204665.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶a07b6***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com
国产一区欧美一区日韩一区| 欧美国产极品一区二区| 亚洲国产精品肉丝袜久久| 日本和亚洲的香蕉视频| 欧美成人免费一级特黄| 人人爽夜夜爽夜夜爽精品视频| 中文字幕日韩欧美理伦片| 欧美精品久久男人的天堂| 欧美午夜视频免费观看| 国产亚洲欧美一区二区| 成人精品亚洲欧美日韩| 色一情一伦一区二区三| 国产精品自拍杆香蕉视频| 日韩在线欧美一区二区| 久久精品亚洲精品一区| 亚洲精品国产第一区二区多人| 国产又粗又猛又长又黄视频| 欧美一区二区三区不卡高清视| 国产亚洲精品俞拍视频福利区| 国产精品久久女同磨豆腐| 欧美大黄片在线免费观看| 欧美一级片日韩一级片| 日本高清不卡一二三区| 欧美日韩国产综合特黄| 亚洲精品黄色片中文字幕| 日韩中文字幕狠狠人妻| 一区二区三区四区亚洲专区| 又黄又色又爽又免费的视频| 午夜福利精品视频视频| 大屁股肥臀熟女一区二区视频| 中文字字幕在线中文乱码二区| 精品日韩视频在线观看| 国产不卡最新在线视频| 人妻乱近亲奸中文字幕| 亚洲国产精品一区二区| 国产欧美日产久久婷婷| 久久夜色精品国产高清不卡| 一区二区三区在线不卡免费| 人妻少妇av中文字幕乱码高清| 男人和女人干逼的视频| 成人精品视频一区二区在线观看|