時(shí)空數(shù)據(jù)分布式存儲(chǔ)研究
發(fā)布時(shí)間:2018-07-21 19:12
【摘要】:時(shí)空數(shù)據(jù)是一種多維數(shù)據(jù)。它的結(jié)構(gòu)異常復(fù)雜,具有空間和時(shí)態(tài)特性。它能夠詳細(xì)的記錄事物空間狀態(tài)和時(shí)空變化,并能正確顯示對(duì)象過去、現(xiàn)在、未來的狀態(tài)。在科技快速發(fā)展的時(shí)代,采集數(shù)據(jù)的設(shè)備種類越來越多,數(shù)據(jù)的數(shù)量也快速增大,從而導(dǎo)致數(shù)據(jù)存儲(chǔ)管理困難。而時(shí)空數(shù)據(jù)存儲(chǔ)管理模塊設(shè)計(jì)實(shí)現(xiàn)的優(yōu)劣決定著整個(gè)數(shù)據(jù)管理系統(tǒng)的工作能力。因此又會(huì)影響到其它上層的應(yīng)用系統(tǒng)運(yùn)行效率。隨著分布式框架的提出,它的高效并行計(jì)算能力、大容量存儲(chǔ)、高擴(kuò)展、高穩(wěn)定等優(yōu)點(diǎn)吸引著我們。本文在前人研究的基礎(chǔ)上對(duì)時(shí)空數(shù)據(jù)分布式存儲(chǔ)進(jìn)行了探索研究。本文先從時(shí)空數(shù)據(jù)和分布式理論著手,研究相關(guān)部分的技術(shù)及原理,提出一種基于R樹的時(shí)空索引,然后以開源云平臺(tái)Hadoop的HBase為數(shù)據(jù)庫載體,利用Map Reduce高效計(jì)算能力對(duì)時(shí)空數(shù)據(jù)進(jìn)行管理,最后通過一些實(shí)驗(yàn)驗(yàn)證索引性能。主要研究內(nèi)容如下:1)深入地分析了經(jīng)典時(shí)空數(shù)據(jù)模型及時(shí)空索引的優(yōu)缺點(diǎn);簡要分析了分布式平臺(tái)的特點(diǎn)及相關(guān)技術(shù),為論文研究提供理論和技術(shù)支撐。2)系統(tǒng)分析了開源云平臺(tái)Hadoop的核心組件Map Reduce并行計(jì)算框架、HDFS分布式文件存儲(chǔ)系統(tǒng)、以HDFS為載體的列式鍵值數(shù)據(jù)庫HBase的數(shù)據(jù)模型。針對(duì)時(shí)空數(shù)據(jù)數(shù)據(jù)量大等特點(diǎn),提出了利用HBase大表來存儲(chǔ)管理時(shí)空數(shù)據(jù)。結(jié)合時(shí)空數(shù)據(jù)與HBase的特性,詳細(xì)闡述了建表過程以及如何設(shè)計(jì)行鍵、定義列族。3)根據(jù)當(dāng)前出現(xiàn)的時(shí)空數(shù)據(jù)索引,提出了一種在R樹的基礎(chǔ)上構(gòu)建時(shí)空數(shù)據(jù)索引,該索引將過去和現(xiàn)在時(shí)間的數(shù)據(jù)分別存儲(chǔ),在各自的樹中分別管理著起始及結(jié)束時(shí)間,提高樹的利用率來提高查詢效率。最后進(jìn)行了對(duì)比實(shí)驗(yàn),測(cè)試本文提出時(shí)空索引的插入及查詢效率。4)最后通過GPS模擬器生成實(shí)驗(yàn)數(shù)據(jù),然后存儲(chǔ)在HBase進(jìn)行管理。
[Abstract]:Spatiotemporal data is a kind of multidimensional data. Its structure is extremely complex, with spatial and temporal characteristics. It can record the spatial and temporal changes of objects in detail, and correctly display the past, present and future states of objects. In the era of rapid development of science and technology, there are more and more kinds of equipment to collect data, and the quantity of data increases rapidly, which leads to the difficulty of data storage and management. The design and implementation of spatiotemporal data storage management module determines the working ability of the whole data management system. Therefore, it will affect the running efficiency of other upper application systems. With the development of distributed architecture, its advantages of high efficiency parallel computing, large storage capacity, high expansion, high stability and so on attract us. Based on the previous researches, this paper explores the distributed storage of spatiotemporal data. In this paper, we start with spatiotemporal data and distributed theory, study the technology and principle of related parts, propose a spatio-temporal index based on R-tree, then take Hadoop's HBase as database carrier. Map reduce efficient computing power is used to manage spatiotemporal data. Finally, some experiments are carried out to verify the performance of the index. The main research contents are as follows: (1) the advantages and disadvantages of classical spatio-temporal data model and spatio-temporal index are analyzed in depth, and the characteristics of distributed platform and related technologies are briefly analyzed. This paper analyses the core component of open source cloud platform Hadoop, Map reduce parallel computing framework, HDFS distributed file storage system, and the data model of HBase, a column key-value database based on HDFS. According to the characteristics of large amount of spatiotemporal data, HBase large table is used to store and manage spatiotemporal data. Based on the characteristics of spatiotemporal data and HBase, this paper expounds the process of building tables and how to design row keys, defines column family .3) according to the index of spatiotemporal data, a spatio-temporal data index based on R-tree is proposed. The index stores the past and present time data separately and manages the start and end times in their respective trees to improve the query efficiency by improving the utilization ratio of the tree. Finally, a comparative experiment is carried out to test the insertion and query efficiency of the spatiotemporal index. Finally, the experimental data is generated by GPS simulator and stored in HBase for management.
【學(xué)位授予單位】:江西理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2015
【分類號(hào)】:P208
本文編號(hào):2136559
[Abstract]:Spatiotemporal data is a kind of multidimensional data. Its structure is extremely complex, with spatial and temporal characteristics. It can record the spatial and temporal changes of objects in detail, and correctly display the past, present and future states of objects. In the era of rapid development of science and technology, there are more and more kinds of equipment to collect data, and the quantity of data increases rapidly, which leads to the difficulty of data storage and management. The design and implementation of spatiotemporal data storage management module determines the working ability of the whole data management system. Therefore, it will affect the running efficiency of other upper application systems. With the development of distributed architecture, its advantages of high efficiency parallel computing, large storage capacity, high expansion, high stability and so on attract us. Based on the previous researches, this paper explores the distributed storage of spatiotemporal data. In this paper, we start with spatiotemporal data and distributed theory, study the technology and principle of related parts, propose a spatio-temporal index based on R-tree, then take Hadoop's HBase as database carrier. Map reduce efficient computing power is used to manage spatiotemporal data. Finally, some experiments are carried out to verify the performance of the index. The main research contents are as follows: (1) the advantages and disadvantages of classical spatio-temporal data model and spatio-temporal index are analyzed in depth, and the characteristics of distributed platform and related technologies are briefly analyzed. This paper analyses the core component of open source cloud platform Hadoop, Map reduce parallel computing framework, HDFS distributed file storage system, and the data model of HBase, a column key-value database based on HDFS. According to the characteristics of large amount of spatiotemporal data, HBase large table is used to store and manage spatiotemporal data. Based on the characteristics of spatiotemporal data and HBase, this paper expounds the process of building tables and how to design row keys, defines column family .3) according to the index of spatiotemporal data, a spatio-temporal data index based on R-tree is proposed. The index stores the past and present time data separately and manages the start and end times in their respective trees to improve the query efficiency by improving the utilization ratio of the tree. Finally, a comparative experiment is carried out to test the insertion and query efficiency of the spatiotemporal index. Finally, the experimental data is generated by GPS simulator and stored in HBase for management.
【學(xué)位授予單位】:江西理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2015
【分類號(hào)】:P208
【參考文獻(xiàn)】
相關(guān)期刊論文 前5條
1 周輝;周曉光;何憑宗;秦佐;楊琦明;;基態(tài)修正模型的時(shí)空數(shù)據(jù)組織和快照查詢方法研究[J];地理信息世界;2010年02期
2 曹志月,劉岳;一種面向?qū)ο蟮臅r(shí)空數(shù)據(jù)模型[J];測(cè)繪學(xué)報(bào);2002年01期
3 龔健雅;GIS中面向?qū)ο髸r(shí)空數(shù)據(jù)模型[J];測(cè)繪學(xué)報(bào);1997年04期
4 郭志恒;劉艷俊;敖杰剛;;分布式環(huán)境下的GML存儲(chǔ)[J];城市勘測(cè);2011年05期
5 王永杰;孟令奎;趙春宇;;基于Hilbert空間排列碼的海量空間數(shù)據(jù)劃分算法研究[J];武漢大學(xué)學(xué)報(bào)(信息科學(xué)版);2007年07期
,本文編號(hào):2136559
本文鏈接:http://sikaile.net/kejilunwen/dizhicehuilunwen/2136559.html
最近更新
教材專著