我國林地“一張圖”數(shù)據(jù)存儲模型與查詢分析并行計算技術(shù)研究
本文關(guān)鍵詞:我國林地“一張圖”數(shù)據(jù)存儲模型與查詢分析并行計算技術(shù)研究 出處:《中國林業(yè)科學(xué)研究院》2016年博士論文 論文類型:學(xué)位論文
更多相關(guān)文章: 林業(yè)GIS 分布式GIS 分布式空間數(shù)據(jù)庫 分布式空間索引 分布式空間分析算法 分布式任務(wù)調(diào)度
【摘要】:林地資源數(shù)據(jù)反映了全國林地的現(xiàn)狀和變化情況,是林業(yè)部門和相關(guān)企業(yè)進行管理和綜合決策的重要依據(jù)。全國林地“一張圖”系統(tǒng)從開始建設(shè)至今,已經(jīng)擁有包括遙感影像、落界數(shù)據(jù)、林地變化數(shù)據(jù)、DEM(Digital Elevation Model)數(shù)據(jù)等,這些數(shù)據(jù)經(jīng)過預(yù)處理后約33TB,成為我國林業(yè)有史以來最大的空間數(shù)據(jù)庫。隨著調(diào)查的進一步展開和應(yīng)用類型的豐富,全國林地“一張圖”所容納的數(shù)據(jù)量越來越大,數(shù)據(jù)類型也越來越多。面對如此大量的數(shù)據(jù),現(xiàn)有的管理方式在效率、可用性和可擴展性上存在的問題越來越突出,目前的研究中也沒有合適的整體方案來解決現(xiàn)存問題。在此背景下,本文對大規(guī)模的空間數(shù)據(jù)在分布式系統(tǒng)中的組織方式和查詢、分析手段進行深入探討和研究。本文深入分析了傳統(tǒng)GIS架構(gòu)和已有分布式GIS研究中在部署和運行中存在的問題,結(jié)合全國林地“一張圖”的數(shù)據(jù)特點,設(shè)計了適用于分布式空間數(shù)據(jù)存儲、空間查詢和空間分析的系統(tǒng)架構(gòu),對其中使用的主要技術(shù)進行詳細闡述,并實現(xiàn)了原型系統(tǒng)對相關(guān)技術(shù)進行驗證。驗證結(jié)果表明原型系統(tǒng)的空間查詢、空間分析和并發(fā)空間訪問都有較高的效率,能夠滿足對全國林地“一張圖”系統(tǒng)對查詢時間的需求。本文所做研究工作如下:(1)分析了全國林地“一張圖”系統(tǒng)的數(shù)據(jù)內(nèi)涵和應(yīng)用需求,在理論上建立了全國林地“一張圖”系統(tǒng)的分布式架構(gòu),并提出了架構(gòu)中的三個核心問題:分布式空間數(shù)據(jù)存儲模型、分布式空間查詢與空間分析算法、分布式空間運算任務(wù)調(diào)度;(2)分布式空間存儲模型研究:通過設(shè)計鍵值數(shù)據(jù)在HDFS(Hadoop Distributed File System)中的組織結(jié)構(gòu)、基于內(nèi)存的分布式數(shù)據(jù)庫架構(gòu)、基于分布式數(shù)據(jù)庫的空間數(shù)據(jù)組織結(jié)構(gòu)、基于散列碼的分布式空間索引,實現(xiàn)了空間數(shù)據(jù)在分布式架構(gòu)的存儲模型,避免了已有研究中存儲分布式空間數(shù)據(jù)時對空間關(guān)系的破壞。測試結(jié)果表明這種存儲模型能將空間查詢速度提升到傳統(tǒng)方式的17-70倍;(3)分布式空間查詢與空間分析算法:使用Hadoop的MapReduce分布式運算框架實現(xiàn)了基于MapReduce的分布式空間分析的基礎(chǔ)邏輯,并實現(xiàn)了一些典型空間分析的具體算法。測試結(jié)果表明這種方法能夠減少復(fù)雜空間分析對系統(tǒng)性能的需求,并在運算量較大時能夠極大提升空間分析的效率;(4)分布式空間運算任務(wù)調(diào)度算法:基于用戶運算量最低配額的思想設(shè)計分布式空間運算任務(wù)的調(diào)度算法,保證空間運算任務(wù)的基本運算量,并盡可能將運算任務(wù)分配到數(shù)據(jù)所在的運算節(jié)點。測試證明該算法比MapReduce的默認算法平均響應(yīng)時間提升了35-40%,任務(wù)平均耗時提高了15%-20%,運算過程中本地數(shù)據(jù)的運算任務(wù)百分比提高了5%-10%。本文的創(chuàng)新點如下:(1)設(shè)計了能夠滿足空間數(shù)據(jù)分布式存儲和分布式空間查詢與空間分析需求的系統(tǒng)架構(gòu);(2)設(shè)計了空間數(shù)據(jù)在分布式文件系統(tǒng)中的物理存儲模型、邏輯存儲模型和分布式空間索引;(3)設(shè)計了空間查詢與空間分析在分布式運算框架中的基本邏輯和一些典型分布式空間分析算法;(4)設(shè)計了空間運算任務(wù)在分布式系統(tǒng)架構(gòu)中的調(diào)度流程。
[Abstract]:Forest resources data reflects the status and changes of the national forest, is an important basis for the forestry department and the relevant enterprise management and integrated decision-making. The national forest "one map" system since the construction, has included the remote sensing image, down bound data, woodland change data, DEM (Digital Elevation Model) data. These data, after pretreatment of about 33TB, to become the largest spatial database of China's forestry history. As the investigation and further expand the application type rich, the amount of data and the national forest "one map" to accommodate the larger, more data types. In the face of such a large amount of data, the existing management in the way of efficiency, availability and scalability issues are increasingly prominent, the current study has no overall plan suitable to solve the existing problems. Under this background, this paper. The spatial data model in the distributed system organization and query, in-depth discussion and analysis means. This paper deeply analyzes the existing in the deployment and operation of the traditional GIS architecture and the existing problems in the distributed GIS data, combined with the characteristics of the national forest "a map", is designed for distributed spatial data storage system architecture, spatial query and spatial analysis, the main use of the technology in detail, and implement a prototype system to verify the relevant technology. The verification results show that the prototype system of spatial query, spatial analysis and spatial access concurrency have higher efficiency, can satisfy the demand of the national forest "query time map system. The research work in this paper are as follows: (1) analysis of the national forest" one map "system data connotation and application requirements, established in theory The distributed architecture of national forest "one map" system, and puts forward three key problems in the architecture of distributed spatial data storage model, distributed spatial query and spatial analysis algorithm, distributed spatial computing task scheduling; (2) the research of distributed spatial storage model: through the design of key data in HDFS (Hadoop Distributed File System) in the structure, the memory architecture based on distributed database, spatial data organization structure based on distributed database, distributed spatial index based on the hash code, realize the storage model of spatial data in a distributed architecture, to avoid damage to the space between the storage space in the studies of distributed data. Test results show that this model can store the spatial query speed up to 17-70 times the traditional way; (3) distributed spatial query and spatial analysis algorithm using Hadoop MapR Educe distributed computing framework based on the realization of distributed spatial logic analysis based on MapReduce, and realizes the specific algorithm analysis of some typical space. The test results show that this method can analyze the performance of the system needs to reduce the complexity of space, and can greatly enhance the efficiency of space in a large amount of computation; (4) the spatial distributed computing tasks scheduling algorithm: user scheduling algorithm is the lowest amount of computation quota design based on distributed spatial computing tasks, ensure the basic computation space computing tasks, and as far as possible be operational tasks assigned to the nodes where the data is located. The test proved that the algorithm is better than the default algorithm MapReduce the average response time increased by 35-40%, the average time to improve the task 15%-20%, the percentage of local data processing tasks in the operation process to improve the innovation of this paper is as follows: 5%-10%. (1). The system can meet the demand structure of spatial data distributed storage and distributed spatial query and spatial analysis; (2) the design of the space physical storage of data in a distributed file system model, logical storage model and distributed spatial index; (3) the design of spatial query and spatial analysis of the basic logic in the distributed computing framework and some typical distributed spatial analysis algorithm; (4) design the scheduling process of spatial operations tasks in a distributed system architecture.
【學(xué)位授予單位】:中國林業(yè)科學(xué)研究院
【學(xué)位級別】:博士
【學(xué)位授予年份】:2016
【分類號】:S757
【參考文獻】
相關(guān)期刊論文 前6條
1 王結(jié)臣;王豹;胡瑋;張輝;;并行空間分析算法研究進展及評述[J];地理與地理信息科學(xué);2011年06期
2 易侃;王汝傳;;分布式任務(wù)調(diào)度與副本復(fù)制集成策略研究[J];通信學(xué)報;2010年09期
3 盧照;師軍;;并行最短路徑搜索算法的設(shè)計與實現(xiàn)[J];計算機工程與應(yīng)用;2010年03期
4 寧利國,孫成良;GIS在林業(yè)上應(yīng)用的發(fā)展概況[J];林業(yè)勘查設(shè)計;2005年02期
5 羅紅,慕德俊,鄧智群,王曉東;網(wǎng)格計算中任務(wù)調(diào)度研究綜述[J];計算機應(yīng)用研究;2005年05期
6 張會儒;計算機技術(shù)在國外林業(yè)中應(yīng)用的現(xiàn)狀及發(fā)展趨向[J];世界林業(yè)研究;1998年05期
相關(guān)博士學(xué)位論文 前1條
1 李惺穎;林地落界數(shù)據(jù)快速查詢技術(shù)研究[D];中國林業(yè)科學(xué)研究院;2014年
相關(guān)碩士學(xué)位論文 前3條
1 魏炎炎;異構(gòu)Hadoop平臺性能分析及其調(diào)度算法優(yōu)化研究[D];合肥工業(yè)大學(xué);2013年
2 段安利;空間拓撲分析操作的并行處理技術(shù)研究[D];南京航空航天大學(xué);2009年
3 張麗麗;支持空間分析的并行算法的研究與實現(xiàn)[D];南京航空航天大學(xué);2008年
,本文編號:1408106
本文鏈接:http://sikaile.net/shoufeilunwen/nykjbs/1408106.html