基于Hbase的大數(shù)據(jù)存儲(chǔ)系統(tǒng)研究開發(fā)

發(fā)布時(shí)間：2018-08-06 09:17

【摘要】：隨著大數(shù)據(jù)時(shí)代的到來(lái),信息系統(tǒng)數(shù)據(jù)庫(kù)所存儲(chǔ)的數(shù)據(jù)量呈爆炸式增長(zhǎng),對(duì)于數(shù)據(jù)讀寫與查詢的性能要求也越來(lái)越高,傳統(tǒng)的關(guān)系型數(shù)據(jù)庫(kù)已經(jīng)無(wú)法滿足大數(shù)據(jù)存儲(chǔ)與查詢的要求。為了探索海量數(shù)據(jù)的存儲(chǔ)與查詢技術(shù),本文課題圍繞典型的非關(guān)系型(NoSQL)數(shù)據(jù)庫(kù)Hbase進(jìn)行了研究開發(fā)。Hbase是GoogleBigTable的開源版本,具有高可靠性、高性能、面向列、可伸縮、一致性等特點(diǎn),支持二級(jí)索引功能。利用Hbase技術(shù)可在廉價(jià)PC Server上搭建起大規(guī)模存儲(chǔ)集群,實(shí)現(xiàn)大數(shù)據(jù)存儲(chǔ)系統(tǒng)。本課題首先研究了大數(shù)據(jù)存儲(chǔ)系統(tǒng)的體系結(jié)構(gòu)、NoSQL數(shù)據(jù)庫(kù)關(guān)鍵技術(shù)、Hbase數(shù)據(jù)庫(kù)關(guān)鍵技術(shù)。接著在Spark大數(shù)據(jù)平臺(tái)部署了 Hbase數(shù)據(jù)庫(kù)系統(tǒng),存儲(chǔ)了流動(dòng)人口數(shù)據(jù)庫(kù)。由于Hbase數(shù)據(jù)庫(kù)只支持基于主鍵的查詢,我們對(duì)流動(dòng)人口數(shù)據(jù)庫(kù)增加了二級(jí)索引功能,大大提高了查詢速度。在此基礎(chǔ)上對(duì)基于Hbase的流動(dòng)人口數(shù)據(jù)庫(kù)的性能進(jìn)行了分析評(píng)估,并使用雅虎公司開發(fā)的測(cè)試工具YCSB對(duì)Hbase的性能進(jìn)行了測(cè)試。測(cè)試對(duì)象是基于某企業(yè)提供的實(shí)際數(shù)據(jù)而建立的Hbase數(shù)據(jù)表,總記錄條數(shù)為3000萬(wàn)條。最后,基于Spark大數(shù)據(jù)平臺(tái)和Hbase數(shù)據(jù)庫(kù)系統(tǒng),開發(fā)了海量流動(dòng)人口數(shù)據(jù)管理原型系統(tǒng)。該系統(tǒng)具有流動(dòng)人口數(shù)據(jù)獲取、數(shù)據(jù)存儲(chǔ)、數(shù)據(jù)管理、統(tǒng)計(jì)分析、系統(tǒng)管理等功能,其中存儲(chǔ)的數(shù)據(jù)記錄達(dá)3000萬(wàn)條,數(shù)據(jù)總量達(dá)到12.6GB,實(shí)現(xiàn)了流動(dòng)人口海量數(shù)據(jù)的高效存儲(chǔ)與快速查詢。
[Abstract]:With the arrival of the big data era, the amount of data stored in the information system database is increasing explosively, and the performance requirements of data reading, writing and querying are becoming more and more high. The traditional relational database can no longer meet the requirements of big data storage and query. In order to explore the storage and query technology of massive data, this paper focuses on the research and development of typical non-relational (NoSQL) database Hbase. Hbase is an open source version of GoogleBigTable, which has the characteristics of high reliability, high performance, column oriented, scalable, consistent, and so on. Support for secondary indexing. A large scale storage cluster can be built on cheap PC Server by using Hbase technology, and the big data storage system can be realized. In this paper, the architecture of big data storage system is studied firstly, and the key technology of Hbase database is discussed. Then the Hbase database system is deployed on the Spark big data platform, and the floating population database is stored. Because the Hbase database only supports the primary key query, we add the secondary index function to the floating population database, which greatly improves the query speed. On this basis, the performance of floating population database based on Hbase is analyzed and evaluated, and the performance of Hbase is tested by YCSB, a testing tool developed by Yahoo Corporation. The test object is a Hbase data table based on the actual data provided by an enterprise. The total number of records is 30 million. Finally, based on Spark big data platform and Hbase database system, a prototype system of massive floating population data management is developed. The system has the functions of data acquisition, data storage, data management, statistical analysis, system management and so on. Among them, 30 million records are stored. The total amount of data reached 12.6 GB, which realized the efficient storage and fast query of massive data of floating population.
【學(xué)位授予單位】：西安理工大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2017
【分類號(hào)】：TP311.13;TP333

【參考文獻(xiàn)】

相關(guān)期刊論文前10條

1 趙越;李培;王震;張聲圳;;電網(wǎng)圖形數(shù)據(jù)管理MongoDB數(shù)據(jù)庫(kù)的應(yīng)用[J];計(jì)算機(jī)系統(tǒng)應(yīng)用;2017年03期

2 熊安萍;王運(yùn)萍;鄒洋;;基于數(shù)據(jù)冗余的HBase合并機(jī)制研究[J];計(jì)算機(jī)工程;2017年02期

3 崔丹;史金鑫;;基于Redis實(shí)現(xiàn)HBase二級(jí)索引的方法[J];軟件;2016年11期

4 陳達(dá)倫;陳榮國(guó);謝炯;;基于MPP架構(gòu)的并行空間數(shù)據(jù)庫(kù)原型系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[J];地球信息科學(xué)學(xué)報(bào);2016年02期

5 丁祥武;陳金鑫;王梅;;異構(gòu)計(jì)算平臺(tái)上列存儲(chǔ)系統(tǒng)的并行連接優(yōu)化策略[J];計(jì)算機(jī)工程與應(yīng)用;2017年05期

6 魏文娟;王黎明;;異構(gòu)Hadoop集群下的比例數(shù)據(jù)分配策略[J];計(jì)算機(jī)應(yīng)用與軟件;2015年06期

7 馬雁云;;基于HBase分布式檔案管理系統(tǒng)研究[J];蘭臺(tái)世界;2015年14期

8 費(fèi)賢舉;王樹鋒;;基于云環(huán)境下的海量大數(shù)據(jù)存儲(chǔ)系統(tǒng)設(shè)計(jì)[J];計(jì)算機(jī)測(cè)量與控制;2014年07期

9 杜曉東;;大數(shù)據(jù)環(huán)境下基于Hbase的分布式查詢優(yōu)化研究[J];計(jì)算機(jī)光盤軟件與應(yīng)用;2014年08期

10 薛峰;梁鋒;徐書勛;王彪任;;基于Spring MVC框架的Web研究與應(yīng)用[J];合肥工業(yè)大學(xué)學(xué)報(bào)(自然科學(xué)版);2012年03期

相關(guān)博士學(xué)位論文前1條

1 丁祥武;列存儲(chǔ)系統(tǒng)的若干關(guān)鍵技術(shù)研究[D];東華大學(xué);2013年

相關(guān)碩士學(xué)位論文前5條

1 陸婷;基于HBase的交通流數(shù)據(jù)實(shí)時(shí)存儲(chǔ)系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[D];北方工業(yè)大學(xué);2016年

2 張彬;基于Spark大數(shù)據(jù)平臺(tái)日志審計(jì)系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[D];山東大學(xué);2015年

3 邱士海;基于分布式存儲(chǔ)系統(tǒng)的企業(yè)級(jí)大數(shù)據(jù)解決方案的研究與實(shí)現(xiàn)[D];吉林大學(xué);2015年

4 關(guān)瑩瑩;基于SSH框架的流動(dòng)人口管理系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[D];吉林大學(xué);2014年

5 黃曉云;基于HDFS的云存儲(chǔ)服務(wù)系統(tǒng)研究[D];大連海事大學(xué);2010年

，

本文編號(hào)：2167265

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2167265.html

上一篇：基于目標(biāo)表觀和幾何建模的物體檢測(cè)研究及應(yīng)用
下一篇：基于隱任務(wù)學(xué)習(xí)的動(dòng)作識(shí)別方法

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于Hbase的大數(shù)據(jù)存儲(chǔ)系統(tǒng)研究開發(fā)