基于HBase的衛(wèi)星空間數(shù)據(jù)查詢系統(tǒng)設計與性能分析
發(fā)布時間:2018-11-04 11:40
【摘要】:隨著航天技術(shù)與信息技術(shù)的融合,數(shù)據(jù)感知和采集范圍得到極大的擴展,衛(wèi)星空間數(shù)據(jù)資源的儲備急速提升。衛(wèi)星空間數(shù)據(jù)的4V(Volum、 Variety、Value、Velocity)特性,使得傳統(tǒng)SQL型數(shù)據(jù)庫因擴展性、并行性方面的限制,其存儲及操作技術(shù)難以滿足衛(wèi)星空間數(shù)據(jù)的分析需求。而近年來快速發(fā)展的基于可并行計算和可擴展存儲的Hadoop和HBase技術(shù),為解決海量數(shù)據(jù)的存儲和查詢提供了一種有效途徑。如果缺乏系統(tǒng)的良好總體設計,衛(wèi)星數(shù)據(jù)的時間特性及HBase中Rowkey的字典序排序特點,在實際應用中數(shù)據(jù)存入系統(tǒng)時容易造成系統(tǒng)的熱點問題,影響系統(tǒng)的負載均衡及存儲查詢性能;同時隨著導入系統(tǒng)的空間數(shù)據(jù)量增加,將促使系統(tǒng)Region不斷地分裂和合并,對系統(tǒng)的寫性能造成影響。此外,針對多維空間數(shù)據(jù)的范圍查詢,HBase基于列的查詢需要進行全表掃描,導致查詢效率低下,難以滿足系統(tǒng)實際的查詢需求。故針對以上問題分別從存儲和查詢兩方面進行系統(tǒng)設計。在存儲方面,提出了空間數(shù)據(jù)散列設計和系統(tǒng)預分區(qū)方案,有效地避免了系統(tǒng)的熱點問題,實現(xiàn)了系統(tǒng)的負載均衡,同時,提高了系統(tǒng)的寫性能;在查詢方面,提出了一種GKD-HBase索引模型,結(jié)合了Grid和KD樹兩種索引方法,分別將二者作為第一和第二索引;使用Hilbert空間填充曲線對多維數(shù)據(jù)進行降維處理,將其轉(zhuǎn)化為一維數(shù)據(jù)進行查詢,從而有效提高系統(tǒng)的查詢效率。最后,對本文設計的查詢系統(tǒng)在存儲和查詢兩方面進行性能測試分析。結(jié)果表明,對海量空間數(shù)據(jù)進行Rowkey散列和對系統(tǒng)進行預分區(qū)設計能有效避免系統(tǒng)集群的熱點問題,使系統(tǒng)達到負載均衡,并得出當Region大小為7G時候,系統(tǒng)寫性能達到最優(yōu)的結(jié)論。在大數(shù)據(jù)環(huán)境下,本文提出的GKD-HBase索引能夠高效進行海量多維空間數(shù)據(jù)的范圍查詢,與Grid索引相比具有顯著性能優(yōu)勢,并為基于HBase衛(wèi)星空間數(shù)據(jù)查詢的實際應用提供有力支撐。對衛(wèi)星空間數(shù)據(jù)的查詢結(jié)果進行關(guān)聯(lián)分析可挖掘出大量潛在的海上或空中目標信息(如通過航空數(shù)據(jù)對海上及空中目標進行識別和追蹤,而目標的識別和追蹤又涉及對海量衛(wèi)星空間數(shù)據(jù)的實時存儲和快速查詢問題)。而本系統(tǒng)的存儲和查詢設計能有效提高系統(tǒng)的存儲性能和查詢性能,具有一定的實際應用價值。
[Abstract]:With the integration of space technology and information technology, the range of data perception and acquisition has been greatly expanded, and the reserve of satellite space data resources has been rapidly increased. Because of the 4V (Volum, Variety,Value,Velocity) characteristic of the satellite spatial data, the traditional SQL database is limited in scalability and parallelism, and its storage and operation technology is difficult to meet the analysis needs of the satellite spatial data. In recent years, the rapid development of Hadoop and HBase technology based on parallel computing and extensible storage provides an effective way to solve the problem of massive data storage and query. If there is no good overall design of the system, the time characteristics of the satellite data and the dictionary ordering characteristics of the Rowkey in HBase, it is easy to cause the hot problems of the system when the data is stored in the system in practical application. Affect the load balance and storage query performance of the system; At the same time, with the increase of the spatial data volume, the system Region will be split and merged, which will affect the write performance of the system. In addition, for the range query of multidimensional spatial data, the query based on HBase column needs to scan the whole table, which leads to the inefficiency of the query, and it is difficult to meet the actual query requirements of the system. Therefore, to solve the above problems, the system is designed from two aspects of storage and query. In the aspect of storage, the spatial data hash design and system pre-partitioning scheme are put forward, which effectively avoid the hot spot of the system, realize the load balance of the system, and improve the writing performance of the system. In the aspect of query, a GKD-HBase index model is proposed, which combines Grid and KD tree as the first index and the second index. The dimension of multidimensional data is reduced by filling curve of Hilbert space, which is transformed into one-dimensional data to query, thus improving the query efficiency of the system effectively. Finally, the performance of the query system designed in this paper is tested and analyzed in storage and query. The results show that Rowkey hashing of massive spatial data and pre-partitioning design of the system can effectively avoid the hot issues of the system cluster, make the system achieve load balance, and obtain that when the Region size is 7G, The conclusion that the writing performance of the system is optimal. In the environment of big data the GKD-HBase index proposed in this paper can efficiently query the range of massive multidimensional spatial data and has significant performance advantages compared with Grid index. It also provides strong support for the practical application of spatial data query based on HBase satellite. The association analysis of the query results of satellite spatial data can extract a large amount of potential marine or aerial target information (such as identifying and tracking maritime and aerial targets through aviation data). Target recognition and tracking involve real-time storage and fast query of massive satellite spatial data. The storage and query design of the system can effectively improve the storage performance and query performance of the system.
【學位授予單位】:北京化工大學
【學位級別】:碩士
【學位授予年份】:2015
【分類號】:TP333;TP311.13
本文編號:2309720
[Abstract]:With the integration of space technology and information technology, the range of data perception and acquisition has been greatly expanded, and the reserve of satellite space data resources has been rapidly increased. Because of the 4V (Volum, Variety,Value,Velocity) characteristic of the satellite spatial data, the traditional SQL database is limited in scalability and parallelism, and its storage and operation technology is difficult to meet the analysis needs of the satellite spatial data. In recent years, the rapid development of Hadoop and HBase technology based on parallel computing and extensible storage provides an effective way to solve the problem of massive data storage and query. If there is no good overall design of the system, the time characteristics of the satellite data and the dictionary ordering characteristics of the Rowkey in HBase, it is easy to cause the hot problems of the system when the data is stored in the system in practical application. Affect the load balance and storage query performance of the system; At the same time, with the increase of the spatial data volume, the system Region will be split and merged, which will affect the write performance of the system. In addition, for the range query of multidimensional spatial data, the query based on HBase column needs to scan the whole table, which leads to the inefficiency of the query, and it is difficult to meet the actual query requirements of the system. Therefore, to solve the above problems, the system is designed from two aspects of storage and query. In the aspect of storage, the spatial data hash design and system pre-partitioning scheme are put forward, which effectively avoid the hot spot of the system, realize the load balance of the system, and improve the writing performance of the system. In the aspect of query, a GKD-HBase index model is proposed, which combines Grid and KD tree as the first index and the second index. The dimension of multidimensional data is reduced by filling curve of Hilbert space, which is transformed into one-dimensional data to query, thus improving the query efficiency of the system effectively. Finally, the performance of the query system designed in this paper is tested and analyzed in storage and query. The results show that Rowkey hashing of massive spatial data and pre-partitioning design of the system can effectively avoid the hot issues of the system cluster, make the system achieve load balance, and obtain that when the Region size is 7G, The conclusion that the writing performance of the system is optimal. In the environment of big data the GKD-HBase index proposed in this paper can efficiently query the range of massive multidimensional spatial data and has significant performance advantages compared with Grid index. It also provides strong support for the practical application of spatial data query based on HBase satellite. The association analysis of the query results of satellite spatial data can extract a large amount of potential marine or aerial target information (such as identifying and tracking maritime and aerial targets through aviation data). Target recognition and tracking involve real-time storage and fast query of massive satellite spatial data. The storage and query design of the system can effectively improve the storage performance and query performance of the system.
【學位授予單位】:北京化工大學
【學位級別】:碩士
【學位授予年份】:2015
【分類號】:TP333;TP311.13
【參考文獻】
相關(guān)期刊論文 前4條
1 何婧;吳躍;楊帆;尹春雷;周維;;基于KD樹和R樹的多維云數(shù)據(jù)索引[J];計算機應用;2014年11期
2 丁飛;陳長松;張濤;楊濤;張巖峰;;基于協(xié)處理器的HBase區(qū)域級第二索引研究與實現(xiàn)[J];計算機應用;2014年S1期
3 徐紅波;郝忠孝;;基于空間填充曲線網(wǎng)格劃分的最近鄰查詢算法[J];計算機科學;2010年01期
4 邱永紅;曾永年;鄒濱;;KDT樹:一種多維空間數(shù)據(jù)索引結(jié)構(gòu)[J];計算機工程與應用;2009年08期
,本文編號:2309720
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2309720.html
最近更新
教材專著