基于Hadoop的電信大數(shù)據(jù)采集方案研究與實(shí)現(xiàn)
發(fā)布時(shí)間:2019-02-09 12:15
【摘要】:ETL是數(shù)據(jù)倉(cāng)庫(kù)實(shí)施過(guò)程中一個(gè)非常重要的步驟,設(shè)計(jì)一個(gè)能夠?qū)Υ髷?shù)據(jù)進(jìn)行有效處理的ETL流程以提高運(yùn)營(yíng)平臺(tái)的采集效率,具有重要的實(shí)際意義。首先簡(jiǎn)單介紹某運(yùn)營(yíng)商大數(shù)據(jù)平臺(tái)采集的主要數(shù)據(jù)內(nèi)容。隨后,為提升海量數(shù)據(jù)采集效率,提出了Hadoop與Oracle混搭架構(gòu)解決方案。繼而,提出一種動(dòng)態(tài)觸發(fā)式ETL調(diào)度流程與算法,與定時(shí)啟動(dòng)的ETL流程調(diào)度方式相比,可有效縮短部分流程的超長(zhǎng)等待時(shí)間;有效避免資源搶占擁堵現(xiàn)象。最后,根據(jù)Hadoop和Oracle的系統(tǒng)運(yùn)行日志,比較分析了兩個(gè)平臺(tái)的采集效率與數(shù)據(jù)量之間的關(guān)系。實(shí)踐表明,混搭架構(gòu)的大數(shù)據(jù)平臺(tái)優(yōu)勢(shì)互補(bǔ),可有效提升數(shù)據(jù)采集時(shí)效性,獲得比較好的應(yīng)用效果。
[Abstract]:ETL is a very important step in the implementation of data warehouse. It is of great practical significance to design a ETL process that can deal with big data effectively in order to improve the collection efficiency of the operation platform. First of all, a brief introduction of the main data collected by big data platform. Then, in order to improve the efficiency of mass data acquisition, a solution of Hadoop and Oracle mashup architecture is proposed. Then, a dynamic trigger ETL scheduling process and algorithm is proposed, which can effectively shorten the long waiting time of some processes and avoid the congestion phenomenon of resource preemption compared with the scheduled ETL process scheduling mode. Finally, according to the system log of Hadoop and Oracle, the relationship between the collection efficiency and the data volume of the two platforms is compared and analyzed. The practice shows that the big data platform of the mashup architecture has complementary advantages, which can effectively improve the timeliness of data acquisition and obtain a better application effect.
【作者單位】: 中國(guó)聯(lián)合網(wǎng)絡(luò)通信有限公司上海市分公司;同濟(jì)大學(xué)軟件學(xué)院;
【分類號(hào)】:TP311.13
,
本文編號(hào):2418947
[Abstract]:ETL is a very important step in the implementation of data warehouse. It is of great practical significance to design a ETL process that can deal with big data effectively in order to improve the collection efficiency of the operation platform. First of all, a brief introduction of the main data collected by big data platform. Then, in order to improve the efficiency of mass data acquisition, a solution of Hadoop and Oracle mashup architecture is proposed. Then, a dynamic trigger ETL scheduling process and algorithm is proposed, which can effectively shorten the long waiting time of some processes and avoid the congestion phenomenon of resource preemption compared with the scheduled ETL process scheduling mode. Finally, according to the system log of Hadoop and Oracle, the relationship between the collection efficiency and the data volume of the two platforms is compared and analyzed. The practice shows that the big data platform of the mashup architecture has complementary advantages, which can effectively improve the timeliness of data acquisition and obtain a better application effect.
【作者單位】: 中國(guó)聯(lián)合網(wǎng)絡(luò)通信有限公司上海市分公司;同濟(jì)大學(xué)軟件學(xué)院;
【分類號(hào)】:TP311.13
,
本文編號(hào):2418947
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2418947.html
最近更新
教材專著