大數(shù)據(jù)量下的實時數(shù)據(jù)報表系統(tǒng)的設(shè)計與實現(xiàn)

發(fā)布時間：2018-04-21 13:24

本文選題：海量數(shù)據(jù) + 大數(shù)據(jù)　；參考：《北京交通大學(xué)》2016年碩士論文

【摘要】：在智能餐飲系統(tǒng)的報表查詢業(yè)務(wù)中,商家用戶對營業(yè)數(shù)據(jù)的總結(jié)具有強烈的需求。報表系統(tǒng)的出現(xiàn)可以輕松的滿足商家用戶的這個需求。在對現(xiàn)有數(shù)據(jù)進行查詢并生成報表數(shù)據(jù)時,存在著大量的針對多張數(shù)據(jù)庫表進行隨機查詢的情況,而且大多包含表連接查詢操作。在數(shù)據(jù)總量小于千萬級別時,傳統(tǒng)處理方式(直接查詢數(shù)據(jù)庫)的數(shù)據(jù)庫響應(yīng)時間能夠被優(yōu)化到十秒以內(nèi),但是當被查詢的數(shù)據(jù)總量到達了幾千萬、上億甚至十億條記錄時,傳統(tǒng)處理方式無論如何優(yōu)化或更改索引機制,不僅無法滿足快速響應(yīng)的多并發(fā)查詢要求,而且查詢數(shù)據(jù)時對數(shù)據(jù)庫造成較大的壓力。本人實習(xí)的公司的當前的處理方式是離線計算方式,即將數(shù)據(jù)導(dǎo)入到數(shù)據(jù)倉庫(hive)中,進行離線計算,再對計算結(jié)果集進行查詢,缺點是無法即席查詢。而本文中介紹了另一種處理方式,通過引入分布式索引層解決上述問題,該處理方式被應(yīng)用于許多大數(shù)據(jù)即席查詢的場景中。在數(shù)據(jù)同步模塊中,通過將許多關(guān)系型數(shù)據(jù)庫中(MySQL)的表合并成一張寬表保證數(shù)據(jù)的完整性,并且利用搜索引擎(Solr)的快速查詢的特點來提高查詢效率�？梢栽跀�(shù)據(jù)量到達5000萬, 每秒20并發(fā)訪問的寬表查詢場景中,實現(xiàn)2秒以內(nèi)返回結(jié)果,并且查詢?nèi)砍晒Α＿@樣的查詢速度以及數(shù)據(jù)的實時性都是傳統(tǒng)處理方式(直接查詢數(shù)據(jù)庫)和離線計算方式無法完成的。論文主要詳細闡述了數(shù)據(jù)全量同步模塊、數(shù)據(jù)增量同步模塊、報表業(yè)務(wù)模塊等的設(shè)計與實現(xiàn)。只有數(shù)據(jù)全量同步模塊和數(shù)據(jù)增量同步模塊的配合才能使得分布式索引中的數(shù)據(jù)同時保持準確性和實時性,再加上報表業(yè)務(wù)模塊根據(jù)業(yè)務(wù)需求對數(shù)據(jù)進行查詢操作,即可給用戶返回實時的報表數(shù)據(jù)。在全量數(shù)據(jù)同步模塊中,通過Java多線程技術(shù)并對同步線程進行智能調(diào)度,大大提升了數(shù)據(jù)的同步速度。數(shù)據(jù)實時同步模塊是基于阿里巴巴的MySQL數(shù)據(jù)同步組件和消息中間件開發(fā)的,此模塊可確保增量數(shù)據(jù)可以實時的同步到分布式索引中去。本人獨立完成了數(shù)據(jù)全量同步模塊中的子表導(dǎo)入子模塊、Hive綁定子模塊、Hive寬表合成子模塊、索引文件生成子模塊,數(shù)據(jù)增量同步模塊中的增量消息發(fā)布者子模塊、增量消息消費者子模塊以及報表業(yè)務(wù)模塊中的會員子模塊。目前該項目已經(jīng)通過測試,正式上線到生產(chǎn)環(huán)境中,整體工作正常,可以為用戶提供實時而又準確的報表數(shù)據(jù)。
[Abstract]:In the report query business of intelligent catering system, business users have a strong demand for summary of business data. The emergence of the report system can easily meet the needs of business users. When querying the existing data and generating report data, there are a large number of random queries for multiple database tables, and most of them contain table join query operations. When the total amount of data is less than ten million levels, the database response time of traditional processing (direct query database) can be optimized to less than 10 seconds, but when the total number of data being queried reaches tens of millions, hundreds of millions or even billions of records, No matter how the traditional processing method optimizes or changes the index mechanism, it can not only meet the requirement of multi-concurrent query with quick response, but also exert great pressure on the database when querying data. The current processing method of the company in which I work as an intern is the off-line calculation, that is, the data is imported into the data warehouse to calculate offline, and then the result set is queried, but the shortcoming is that it cannot be queried impromptu. In this paper, another processing method is introduced, which is solved by introducing distributed index layer, which is applied in many scenarios of big data ad hoc query. In the data synchronization module, the query efficiency is improved by merging the tables of MySQL into a wide table to ensure the integrity of the data and using the fast query characteristics of search engine Solr. The result can be returned within 2 seconds in the wide table query scene where the amount of data reaches 50 million and 20 times per second, and the query is all successful. This query speed and the real-time data can not be completed by traditional processing (direct query database) and offline computing. The design and implementation of data synchronization module, data increment synchronization module and report business module are discussed in detail. Only the cooperation of the data total synchronization module and the data increment synchronization module can make the data in the distributed index keep accurate and real-time simultaneously, and the report business module queries the data according to the business requirements. Can return real-time report data to the user. In the whole data synchronization module, the synchronization speed is greatly improved by Java multi-thread technology and intelligent scheduling of synchronous thread. The data real-time synchronization module is based on Alibaba's MySQL data synchronization component and message middleware. This module can ensure that the incremental data can be synchronized to the distributed index in real time. I have independently completed the sub-table import sub-module (Hive binding stator module), the Hive wide table synthesis sub-module, the index file generation sub-module, the incremental message publisher sub-module in the data total synchronization module. Incremental message consumer sub-module and report business module in the membership sub-module. At present, the project has passed the test, formally online to the production environment, the overall work is normal, can provide users with real-time and accurate report data.
【學(xué)位授予單位】：北京交通大學(xué)
【學(xué)位級別】：碩士
【學(xué)位授予年份】：2016
【分類號】：TP311.52

【相似文獻】

相關(guān)期刊論文前10條

1 王海燕;Visual Basic6.0中復(fù)雜數(shù)據(jù)報表的設(shè)計輸出[J];計算機時代;2003年02期

2 李菲;;淺談高職高專VB課中建立報表的方法[J];電腦知識與技術(shù);2009年16期

3 繆天宇;;工業(yè)生產(chǎn)數(shù)據(jù)報表軟件的開發(fā)與應(yīng)用[J];硅谷;2012年17期

4 龐彩霞;利用三層結(jié)構(gòu)設(shè)計數(shù)據(jù)報表[J];洛陽工業(yè)高等�？茖W(xué)校學(xué)報;2005年03期

5 劉東;;VB控制EXCEL生成報表[J];宜賓學(xué)院學(xué)報;2005年12期

6 龐寶麟,鄭丹,繆新穎;縣級電力企業(yè)數(shù)據(jù)報表軟件的開發(fā)與應(yīng)用[J];農(nóng)業(yè)機械化與電氣化;2005年02期

7 翁志良,張華,張俊輝,李昭,丁玉章;一種基于樹結(jié)構(gòu)的數(shù)據(jù)報表發(fā)布體系及實現(xiàn)[J];計算機應(yīng)用;2002年03期

8 曹淑芬;;數(shù)據(jù)報表的生成方法[J];科技資訊;2010年36期

9 李清;淺談VB中使用Excel的VBA對象制表[J];銅陵財經(jīng)專科學(xué)校學(xué)報;2002年01期

10 陸云;通用數(shù)據(jù)報表管理系統(tǒng)的設(shè)計與實現(xiàn)[J];微機發(fā)展;1994年02期

相關(guān)會議論文前1條

1 王云麗;;ERP系統(tǒng)數(shù)據(jù)的擴展應(yīng)用[A];2011年河北省冶金信息化自動化年會論文集[C];2011年

相關(guān)重要報紙文章前10條

1 本報記者郭濤;善用機器大數(shù)據(jù)[N];中國計算機報;2014年

2 采訪人中國出版?zhèn)髅缴虉笥浾?欣聞;“商報·東方數(shù)據(jù)”獨具價值實力[N];中國出版?zhèn)髅缴虉?2014年

3 服裝業(yè)ERP咨詢顧問童繼龍;兩起事故引發(fā)的數(shù)據(jù)治理[N];中國計算機報;2008年

4 韶關(guān)鋼鐵集團有限公司梁彩達邋雷恒陸吉利;軋鋼自動化:讓數(shù)據(jù)說話[N];計算機世界;2008年

5 記者吳秀霞特約記者龔輝平;集成基本數(shù)據(jù) 告別信息孤島[N];中國船舶報;2008年

6 本報記者別坤;IT新嘗試；更多創(chuàng)新更小風險[N];計算機世界;2012年

7 一泓;構(gòu)建輔助決策平臺[N];金融時報;2000年

8 一波;不用看上去太美[N];中國計算機報;2005年

9 記者嚴風華;滬東中華自主設(shè)計軟件最新版亮相[N];中國船舶報;2012年

10 山西中交翼侯高速公路有限公司姜中石;高速公路路政信息化管理之我見[N];山西科技報;2011年

相關(guān)碩士學(xué)位論文前10條

1 姜春秀;基于大數(shù)據(jù)的大型房地產(chǎn)企業(yè)成本控制研究[D];山東建筑大學(xué);2015年

2 趙鑫;內(nèi)蒙古政務(wù)公開系統(tǒng)設(shè)計與實現(xiàn)[D];大連理工大學(xué);2015年

3 張婧;上海金豐易居房地產(chǎn)公司會計信息系統(tǒng)設(shè)計與實現(xiàn)[D];電子科技大學(xué);2014年

4 王飛;企業(yè)信息數(shù)據(jù)防泄漏系統(tǒng)的設(shè)計與實現(xiàn)[D];電子科技大學(xué);2014年

5 徐凱田;基于大數(shù)據(jù)的智慧移動醫(yī)療信息系統(tǒng)結(jié)構(gòu)研究[D];青島科技大學(xué);2015年

6 李偉;供電公司大數(shù)據(jù)集中管控系統(tǒng)方案設(shè)計[D];華北電力大學(xué);2015年

7 周勛;生產(chǎn)數(shù)據(jù)網(wǎng)絡(luò)管理系統(tǒng)升級及數(shù)據(jù)整合設(shè)計與實現(xiàn)[D];電子科技大學(xué);2015年

8 鄒努;大數(shù)據(jù)在企業(yè)中的應(yīng)用[D];南昌大學(xué);2015年

9 王海峰;大數(shù)據(jù)智庫：中國特色新型智庫建設(shè)途徑研究[D];華東政法大學(xué);2016年

10 馬志超;基于J2EE的財務(wù)數(shù)據(jù)報表系統(tǒng)的設(shè)計與實現(xiàn)[D];吉林大學(xué);2016年

，

本文編號：1782620

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/sousuoyinqinglunwen/1782620.html

上一篇：面向虛擬企業(yè)的智能化專業(yè)搜索引擎的研究與實現(xiàn)
下一篇：科技知識對象的語義模式研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

大數(shù)據(jù)量下的實時數(shù)據(jù)報表系統(tǒng)的設(shè)計與實現(xiàn)