電商數(shù)據(jù)倉庫作業(yè)調(diào)度系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)
[Abstract]:Data has become the core competitiveness of contemporary Internet enterprises, and efficient job scheduling system is an important tool for offline mass data management. Who can effectively manage these massive data and effectively mine valuable information. The ETL job is the core of the daily work of data warehouse. A large number of jobs with complex relationships can only be carried out efficiently and orderly under the scheduling management of the job scheduling system. In the current era of information economy with data as productivity, the daily work of e-commerce data warehouse is no longer a simple data backup and log pull, any data can be associated with the possibility of a new spark. Therefore, the job scheduling system should not only guarantee the efficient and stable triggering of jobs, but also take into account the dependencies among the jobs. Finally, all jobs will be triggered in an orderly manner in the form of job chains. These requirements are the new challenges to the construction of job scheduling system. With the arrival of big data era, big data processing tools based on Hadoop ecosystem have been widely accepted by the market. The birth of Hive database is to meet the needs of big data era. In this system, the support of Hive data processing is brought into the important part of data warehouse, and the stable and high expansibility advantage of hadoop cluster is fully utilized, and the distributed cluster is adopted to meet the stable / efficient / economical demand of electronic commerce enterprises for data warehouse. The new job scheduling system not only supports conventional relational database processing, but also can be compatible with HIVE data processing function. At present, most of the job scheduling systems in domestic and foreign enterprises are mainly self-built, and there are some excellent open source job scheduling systems (such as OOZIE) and some excellent job scheduling system frameworks (such as quartz),). However, in the use of scenarios and functions with the current stage of enterprise development requirements. Through summing up the scheduling requirements of daily work, this paper designs and develops a set of customized scheduling engine of e-commerce data warehouse in accordance with the current development stage for the enterprise. Data developers can easily deploy their jobs on any job machine and provide a series of unified and efficient management such as cycle adjustment, flexible addition of dependencies, load balancing, logging, monitoring and alarm, and so on.
【學(xué)位授予單位】:首都經(jīng)濟(jì)貿(mào)易大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP311.13
【參考文獻(xiàn)】
中國期刊全文數(shù)據(jù)庫 前10條
1 宋丹;黃旭;;新興技術(shù)在商業(yè)智能創(chuàng)新發(fā)展中的應(yīng)用[J];中國管理信息化;2016年19期
2 周柱;郎朗;;Ajax技術(shù)在B/S架構(gòu)中的數(shù)據(jù)傳輸應(yīng)用研究[J];新余學(xué)院學(xué)報(bào);2016年03期
3 李治;;數(shù)據(jù)挖掘在商業(yè)信息服務(wù)中的應(yīng)用[J];電腦知識與技術(shù);2015年05期
4 趙宣容;;計(jì)算機(jī)軟件數(shù)據(jù)庫設(shè)計(jì)的重要性以及原則探討[J];電子技術(shù)與軟件工程;2015年17期
5 王有為;王偉平;孟丹;;基于統(tǒng)計(jì)方法的Hive數(shù)據(jù)倉庫查詢優(yōu)化實(shí)現(xiàn)[J];計(jì)算機(jī)研究與發(fā)展;2015年06期
6 曹靖;;提高Java數(shù)據(jù)庫訪問效率的策略研究[J];通訊世界;2015年11期
7 葉均隆;葉均明;何銀川;;Tomcat執(zhí)行定時(shí)任務(wù)實(shí)現(xiàn)不同系統(tǒng)數(shù)據(jù)導(dǎo)入[J];現(xiàn)代計(jì)算機(jī)(專業(yè)版);2015年09期
8 羅強(qiáng);何利力;王曉菲;;數(shù)據(jù)倉庫中數(shù)據(jù)清洗技術(shù)分析[J];電腦編程技巧與維護(hù);2015年02期
9 聶章艷;李川;唐常杰;徐洪宇;張永輝;楊寧;;面向OLGP的多維信息網(wǎng)絡(luò)數(shù)據(jù)倉庫模型設(shè)計(jì)[J];計(jì)算機(jī)科學(xué)與探索;2014年01期
10 侯增江;王勇;饒磊;;一種高可用性的計(jì)劃任務(wù)管理方法[J];計(jì)算機(jī)與現(xiàn)代化;2012年12期
中國博士學(xué)位論文全文數(shù)據(jù)庫 前1條
1 馬丹;任務(wù)間相互依賴的并行作業(yè)調(diào)度算法研究[D];華中科技大學(xué);2007年
中國碩士學(xué)位論文全文數(shù)據(jù)庫 前4條
1 王偉;基于Hive的物流數(shù)據(jù)倉庫研究與實(shí)現(xiàn)[D];東華大學(xué);2016年
2 張智敏;數(shù)據(jù)倉庫之ETL并行調(diào)度研發(fā)[D];吉林大學(xué);2015年
3 金迎;基于SaaS的中小企業(yè)區(qū)域信息化支持平臺構(gòu)建研究[D];東北林業(yè)大學(xué);2011年
4 王云輝;工作流建模過程的分析與設(shè)計(jì)[D];吉林大學(xué);2004年
,本文編號:2340805
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2340805.html