基于MapReduce的分布式空間連接查詢研究
本文選題:HDFS 切入點(diǎn):MapReduce 出處:《江西理工大學(xué)》2013年碩士論文
【摘要】:近年來,隨著信息化步伐的加快,地理空間信息獲取技術(shù)進(jìn)步日新月異。同時(shí),地理空間數(shù)據(jù)規(guī)模與日俱增,已成為海量數(shù)據(jù)的重要來源之一?臻g連接查詢是一種常用且非常耗時(shí)的復(fù)雜空間查詢操作,特別是在處理大規(guī)?臻g數(shù)據(jù)集時(shí),由于傳統(tǒng)單機(jī)系統(tǒng)和MPI集群系統(tǒng)都難以滿足其對時(shí)空開銷的需求,因此,如何在云計(jì)算環(huán)境中設(shè)計(jì)高效的分布式空間連接查詢算法已成為當(dāng)前學(xué)術(shù)界和產(chǎn)業(yè)界研究的熱點(diǎn)問題。 本文首次嘗試提出了一種云計(jì)算環(huán)境下的分布式QR-樹索引結(jié)構(gòu),,并在該索引基礎(chǔ)上進(jìn)行基于MapReduce的空間連接查詢。本文主要工作如下: (1)提出了一種云計(jì)算環(huán)境下能夠支持大規(guī)模數(shù)據(jù)集的分布式QR-樹索引結(jié)構(gòu),并詳細(xì)介紹了其構(gòu)建的過程。分布式QR-樹索引的構(gòu)建過程可分為以下兩個(gè)步驟:首先,采用基于四叉樹的空間數(shù)據(jù)劃分對空間數(shù)據(jù)集進(jìn)行劃分并分布式存儲(chǔ)在HDFS數(shù)據(jù)塊中;然后,在分割后的每個(gè)子區(qū)域數(shù)據(jù)塊中并行構(gòu)建R樹索引。 (2)在構(gòu)建分布式QR-樹索引基礎(chǔ)上,將分布式QR-樹索引結(jié)構(gòu)與分布式并行計(jì)算框架MapReduce相結(jié)合,設(shè)計(jì)和實(shí)現(xiàn)了基于MapReduce的空間連接查詢算法QRSJ-MR。另外,針對算法中存在的索引并發(fā)訪問問題,采用了實(shí)時(shí)緩存機(jī)制對索引并發(fā)訪問進(jìn)行優(yōu)化。 (3)搭建Hadoop集群環(huán)境,測試基于MapReduce的分布式空間連接算法QRSJ-MR的效率。本文在空間交疊連接查詢和空間包含連接查詢上,分別與非索引的MapReduce空間連接算法和基于R-樹索引的MapReduce空間連接算法做了性能對比實(shí)驗(yàn)。 實(shí)驗(yàn)結(jié)果表明:與非索引的MapReduce空間連接算法和基于R-樹索引的MapReduce空間連接查詢算法相比,無論在空間交疊連接查詢還是在空間包含連接查詢上,QRSJ-MR算法都具有更高的執(zhí)行效率。
[Abstract]:In recent years, with the acceleration of the pace of information technology, geospatial information acquisition technology is changing with each passing day. At the same time, the scale of geospatial data is increasing with each passing day. Spatial join query is a common and time-consuming complex spatial query operation, especially when dealing with large-scale spatial data sets. Because the traditional single-machine system and MPI cluster system can not meet the demand of space-time overhead, how to design an efficient distributed spatial join query algorithm in cloud computing environment has become a hot issue in academia and industry. This paper proposes a distributed QR-tree index structure in cloud computing environment for the first time, and carries on spatial join query based on MapReduce based on this index. The main work of this paper is as follows:. In this paper, a distributed QR-tree index structure which can support large-scale data sets in cloud computing environment is proposed, and the construction process of distributed QR-tree index is introduced in detail. The construction process of distributed QR-tree index can be divided into the following two steps: first of all, Spatial data sets are partitioned based on quadtree and stored distributed in HDFS data blocks, and then R-tree indexes are constructed in parallel in each sub-region data block after segmentation. 2) on the basis of constructing distributed QR-tree index, combining distributed QR-tree index structure with distributed parallel computing framework (MapReduce), a spatial join query algorithm QRSJ-MRbased on MapReduce is designed and implemented. Aiming at the problem of index concurrent access in the algorithm, the real-time cache mechanism is used to optimize the index concurrent access. 3) build Hadoop cluster environment, test the efficiency of QRSJ-MR, a distributed spatial join algorithm based on MapReduce. Performance comparison experiments with non-indexed MapReduce spatial join algorithm and MapReduce spatial join algorithm based on R- tree index are carried out respectively. The experimental results show that the QRSJ-MR algorithm is more efficient than the non-indexed MapReduce spatial join algorithm and the MapReduce spatial join query algorithm based on R- tree index in terms of spatial overlap join query and spatial inclusion join query.
【學(xué)位授予單位】:江西理工大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:P208
【參考文獻(xiàn)】
中國期刊全文數(shù)據(jù)庫 前8條
1 唐桂芬;楊偉鋒;黃雙臨;李煒;;一種高效的累進(jìn)式空間連接查詢處理算法[J];電子學(xué)報(bào);2009年02期
2 李建江;崔健;王聃;嚴(yán)林;黃義雙;;MapReduce并行編程模型研究綜述[J];電子學(xué)報(bào);2011年11期
3 劉義;陳犖;景寧;熊偉;;基于R-樹索引的Map-Reduce空間連接聚集操作[J];國防科技大學(xué)學(xué)報(bào);2013年01期
4 潘紅巖;郝忠孝;;基于4CDRS的空間連接查詢[J];哈爾濱理工大學(xué)學(xué)報(bào);2010年04期
5 劉義;陳犖;景寧;劉露;;海量空間數(shù)據(jù)的并行Top-k連接查詢[J];計(jì)算機(jī)研究與發(fā)展;2011年S3期
6 趙清華;陳犖;景寧;;基于Kd樹遞歸區(qū)域劃分的分布式空間連接查詢[J];計(jì)算機(jī)工程與科學(xué);2011年08期
7 楊澤雪;郝忠孝;;受限空間連接查詢及代價(jià)分析[J];哈爾濱工業(yè)大學(xué)學(xué)報(bào);2012年11期
8 回敬齊;李伯權(quán);陳芳芳;;空間數(shù)據(jù)庫R-tree連接方法研究[J];齊齊哈爾大學(xué)學(xué)報(bào)(自然科學(xué)版);2010年04期
本文編號:1679774
本文鏈接:http://sikaile.net/kejilunwen/dizhicehuilunwen/1679774.html