電商數(shù)據(jù)倉(cāng)庫(kù)作業(yè)調(diào)度系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)

發(fā)布時(shí)間：2018-11-18 18:31

【摘要】：數(shù)據(jù)已成為當(dāng)代互聯(lián)網(wǎng)企業(yè)核心競(jìng)爭(zhēng)力,而高效的作業(yè)調(diào)度系統(tǒng)是離線海量數(shù)據(jù)管理的重要工具,誰(shuí)能有效管理這些海量數(shù)據(jù),并能有效挖掘其中有價(jià)值信息,誰(shuí)就站在了戰(zhàn)略至高點(diǎn)。ETL作業(yè)是數(shù)據(jù)倉(cāng)庫(kù)日常工作的核心內(nèi)容,海量具有復(fù)雜關(guān)系的作業(yè)只有在作業(yè)調(diào)度系統(tǒng)的調(diào)度管理下才能高效有序進(jìn)行。在當(dāng)前以數(shù)據(jù)為生產(chǎn)力的信息經(jīng)濟(jì)時(shí)代,電商數(shù)據(jù)倉(cāng)庫(kù)日常工作已不是簡(jiǎn)單的數(shù)據(jù)備份和日志拉取,任何能夠關(guān)聯(lián)的數(shù)據(jù)都有可能產(chǎn)生新的火花。由此,作業(yè)調(diào)度系統(tǒng)不僅要保證高效和穩(wěn)定地觸發(fā)作業(yè),又要兼顧各作業(yè)之間的依賴關(guān)系,最后以作業(yè)鏈的形式有序觸發(fā)所有作業(yè)。這些要求是作業(yè)調(diào)度系統(tǒng)建設(shè)將要面臨的新的挑戰(zhàn)。隨著大數(shù)據(jù)時(shí)代的到來(lái),以Hadoop生態(tài)系統(tǒng)為基礎(chǔ)的大數(shù)據(jù)處理工具得到了市場(chǎng)的廣泛認(rèn)可。而Hive數(shù)據(jù)庫(kù)的誕生正迎合了大數(shù)據(jù)時(shí)代的需要。本系統(tǒng)將對(duì)Hive數(shù)據(jù)處理的支持納入數(shù)據(jù)倉(cāng)庫(kù)重要部分,充分利用了hadoop集群穩(wěn)定高擴(kuò)展性優(yōu)勢(shì),采用分布式集群滿足電子商務(wù)企業(yè)對(duì)數(shù)據(jù)倉(cāng)庫(kù)的穩(wěn)定/高效/經(jīng)濟(jì)的需求。由此新的作業(yè)調(diào)度系統(tǒng)不僅支持常規(guī)關(guān)系型數(shù)據(jù)庫(kù)處理,還能兼容HIVE數(shù)據(jù)處理功能。目前,國(guó)內(nèi)外各大企業(yè)數(shù)據(jù)倉(cāng)庫(kù)作業(yè)調(diào)度系統(tǒng)多以自主建設(shè)為主,也有些優(yōu)秀的開(kāi)源作業(yè)調(diào)度系統(tǒng)(如OOZIE)和一些優(yōu)秀的作業(yè)調(diào)度系統(tǒng)框架(如quartz),但是在使用場(chǎng)景和功能上與企業(yè)當(dāng)前發(fā)展階段需求不符。本文通過(guò)總結(jié)日常工作中的調(diào)度需求,為企業(yè)設(shè)計(jì)開(kāi)發(fā)了一套符合當(dāng)前發(fā)展階段的定制化的電商數(shù)據(jù)倉(cāng)庫(kù)作業(yè)調(diào)度引擎,數(shù)據(jù)開(kāi)發(fā)人員能夠方便地在任意作業(yè)機(jī)部署自己的作業(yè),并提供按周期調(diào)起,靈活添加依賴,負(fù)載均衡,日志記錄,監(jiān)控報(bào)警等一系列的統(tǒng)一高效管理。
[Abstract]:Data has become the core competitiveness of contemporary Internet enterprises, and efficient job scheduling system is an important tool for offline mass data management. Who can effectively manage these massive data and effectively mine valuable information. The ETL job is the core of the daily work of data warehouse. A large number of jobs with complex relationships can only be carried out efficiently and orderly under the scheduling management of the job scheduling system. In the current era of information economy with data as productivity, the daily work of e-commerce data warehouse is no longer a simple data backup and log pull, any data can be associated with the possibility of a new spark. Therefore, the job scheduling system should not only guarantee the efficient and stable triggering of jobs, but also take into account the dependencies among the jobs. Finally, all jobs will be triggered in an orderly manner in the form of job chains. These requirements are the new challenges to the construction of job scheduling system. With the arrival of big data era, big data processing tools based on Hadoop ecosystem have been widely accepted by the market. The birth of Hive database is to meet the needs of big data era. In this system, the support of Hive data processing is brought into the important part of data warehouse, and the stable and high expansibility advantage of hadoop cluster is fully utilized, and the distributed cluster is adopted to meet the stable / efficient / economical demand of electronic commerce enterprises for data warehouse. The new job scheduling system not only supports conventional relational database processing, but also can be compatible with HIVE data processing function. At present, most of the job scheduling systems in domestic and foreign enterprises are mainly self-built, and there are some excellent open source job scheduling systems (such as OOZIE) and some excellent job scheduling system frameworks (such as quartz),). However, in the use of scenarios and functions with the current stage of enterprise development requirements. Through summing up the scheduling requirements of daily work, this paper designs and develops a set of customized scheduling engine of e-commerce data warehouse in accordance with the current development stage for the enterprise. Data developers can easily deploy their jobs on any job machine and provide a series of unified and efficient management such as cycle adjustment, flexible addition of dependencies, load balancing, logging, monitoring and alarm, and so on.
【學(xué)位授予單位】：首都經(jīng)濟(jì)貿(mào)易大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2017
【分類號(hào)】：TP311.13

【參考文獻(xiàn)】

相關(guān)期刊論文前10條

1 宋丹;黃旭;;新興技術(shù)在商業(yè)智能創(chuàng)新發(fā)展中的應(yīng)用[J];中國(guó)管理信息化;2016年19期

2 周柱;郎朗;;Ajax技術(shù)在B/S架構(gòu)中的數(shù)據(jù)傳輸應(yīng)用研究[J];新余學(xué)院學(xué)報(bào);2016年03期

3 李治;;數(shù)據(jù)挖掘在商業(yè)信息服務(wù)中的應(yīng)用[J];電腦知識(shí)與技術(shù);2015年05期

4 趙宣容;;計(jì)算機(jī)軟件數(shù)據(jù)庫(kù)設(shè)計(jì)的重要性以及原則探討[J];電子技術(shù)與軟件工程;2015年17期

5 王有為;王偉平;孟丹;;基于統(tǒng)計(jì)方法的Hive數(shù)據(jù)倉(cāng)庫(kù)查詢優(yōu)化實(shí)現(xiàn)[J];計(jì)算機(jī)研究與發(fā)展;2015年06期

6 曹靖;;提高Java數(shù)據(jù)庫(kù)訪問(wèn)效率的策略研究[J];通訊世界;2015年11期

7 葉均隆;葉均明;何銀川;;Tomcat執(zhí)行定時(shí)任務(wù)實(shí)現(xiàn)不同系統(tǒng)數(shù)據(jù)導(dǎo)入[J];現(xiàn)代計(jì)算機(jī)(專業(yè)版);2015年09期

8 羅強(qiáng);何利力;王曉菲;;數(shù)據(jù)倉(cāng)庫(kù)中數(shù)據(jù)清洗技術(shù)分析[J];電腦編程技巧與維護(hù);2015年02期

9 聶章艷;李川;唐常杰;徐洪宇;張永輝;楊寧;;面向OLGP的多維信息網(wǎng)絡(luò)數(shù)據(jù)倉(cāng)庫(kù)模型設(shè)計(jì)[J];計(jì)算機(jī)科學(xué)與探索;2014年01期

10 侯增江;王勇;饒磊;;一種高可用性的計(jì)劃任務(wù)管理方法[J];計(jì)算機(jī)與現(xiàn)代化;2012年12期

相關(guān)博士學(xué)位論文前1條

1 馬丹;任務(wù)間相互依賴的并行作業(yè)調(diào)度算法研究[D];華中科技大學(xué);2007年

相關(guān)碩士學(xué)位論文前4條

1 王偉;基于Hive的物流數(shù)據(jù)倉(cāng)庫(kù)研究與實(shí)現(xiàn)[D];東華大學(xué);2016年

2 張智敏;數(shù)據(jù)倉(cāng)庫(kù)之ETL并行調(diào)度研發(fā)[D];吉林大學(xué);2015年

3 金迎;基于SaaS的中小企業(yè)區(qū)域信息化支持平臺(tái)構(gòu)建研究[D];東北林業(yè)大學(xué);2011年

4 王云輝;工作流建模過(guò)程的分析與設(shè)計(jì)[D];吉林大學(xué);2004年

，

本文編號(hào)：2340804

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2340804.html

上一篇：基于粗集的多尺度空間拓?fù)潢P(guān)系不確定性定量評(píng)價(jià)模型
下一篇：電商數(shù)據(jù)倉(cāng)庫(kù)作業(yè)調(diào)度系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

電商數(shù)據(jù)倉(cāng)庫(kù)作業(yè)調(diào)度系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)