空間信息服務(wù)云存儲(chǔ)與管理機(jī)制的初步研究
本文選題:空間信息服務(wù)云 + 數(shù)據(jù)區(qū)別冗余 ; 參考:《成都理工大學(xué)》2012年碩士論文
【摘要】:所謂云服務(wù),是指利用計(jì)算機(jī)硬件技術(shù)、軟件技術(shù)、信息安全技術(shù)、網(wǎng)絡(luò)技術(shù)、空間信息技術(shù)、通信技術(shù)、虛擬化技術(shù)、集群技術(shù)和存儲(chǔ)技術(shù)以及并行計(jì)算等技術(shù),將大量分布在網(wǎng)絡(luò)中的各種資源聯(lián)合起來進(jìn)行統(tǒng)一管理和調(diào)度,構(gòu)成一個(gè)龐大的資源池以按需、易擴(kuò)展的方式向用戶提供服務(wù)。云服務(wù)有很多分類,根據(jù)云的服務(wù)方式不同,可以將其劃分為公有云、私有云和混合云三類。按照體系結(jié)構(gòu)可以劃分為SaaS、PaaS和IaaS三種模型。除此之外,根據(jù)用戶的使用可以將云服務(wù)劃分為文件云、設(shè)備云和應(yīng)用云三類。 目前市面上云服務(wù)很多,但是與空間信息相結(jié)合的云服務(wù)很少?臻g信息是指用來描述空間地理位置、空間實(shí)體分布特征及時(shí)間空間特性的信息。據(jù)不完全統(tǒng)計(jì),在人類能獲取的信息中,有超過80%的信息是與空間信息相關(guān)的信息?臻g信息數(shù)據(jù)通常具有數(shù)據(jù)量大、種類繁多、結(jié)構(gòu)復(fù)雜、專業(yè)性強(qiáng)、來源廣泛和實(shí)時(shí)性強(qiáng)等特點(diǎn)。目前對(duì)于空間信息數(shù)據(jù)的存儲(chǔ)和管理方面存在不足,缺乏對(duì)空間信息數(shù)據(jù)的快速感知能力,缺乏對(duì)海量空間信息數(shù)據(jù)的存儲(chǔ)和管理機(jī)制,缺乏對(duì)空間信息數(shù)據(jù)的備份和災(zāi)難恢復(fù)機(jī)制,因此,提供空間信息服務(wù)云具有重大的意義。 本文根據(jù)云服務(wù)現(xiàn)狀,以G/S模式的理論體系、系統(tǒng)架構(gòu)為理論依據(jù),以相關(guān)課題和項(xiàng)目為依托,分析了空間信息數(shù)據(jù)的特點(diǎn)和空間信息服務(wù)云的的特點(diǎn)和需求,對(duì)比分析了主流的存儲(chǔ)技術(shù)、并行計(jì)算技術(shù)和分布式計(jì)算技術(shù),以當(dāng)今既穩(wěn)定又流行的開源分布式文件系統(tǒng)HDFS為基礎(chǔ),以分布式計(jì)算框架MapReduce為編程模型,研究了空間信息服務(wù)云的整體架構(gòu)和內(nèi)部的數(shù)據(jù)存儲(chǔ)管理工作機(jī)制。本文主要取得了如下的成果: (1)研究了空間信息數(shù)據(jù)的特點(diǎn),總結(jié)起來主要表現(xiàn)在海量、多源、異構(gòu)和時(shí)空屬性四個(gè)方面。 (2)研究了空間信息服務(wù)云的具體需求,并結(jié)合這些需求對(duì)比研究了各種存儲(chǔ)技術(shù),包括RAID技術(shù)、FastDFS文件系統(tǒng)、MooseFS文件系統(tǒng)以及HDFS的架構(gòu)特點(diǎn)和應(yīng)用領(lǐng)域,對(duì)比了并行計(jì)算技術(shù)MPI和分布式計(jì)算框架MapReduce,選擇了適合空間信息數(shù)據(jù)特點(diǎn),滿足空間信息服務(wù)云需求的HDFS文件系統(tǒng)和MapReduce編程框架為基礎(chǔ)架構(gòu)和編程模型。 (3)研究了搭建空間信息服務(wù)云平臺(tái)過程中預(yù)防主節(jié)點(diǎn)單點(diǎn)失效應(yīng)該注意的問題和具體的配置方案。 (4)根據(jù)用戶訪問系統(tǒng)數(shù)據(jù)的頻度,,通過增加系統(tǒng)日志文件的方法,編寫了MapReduce應(yīng)用程序去處理數(shù)據(jù)區(qū)別冗余問題,既增加了系統(tǒng)的實(shí)用性又節(jié)約了系統(tǒng)的存儲(chǔ)成本和管理資源。 (5)提出了一種具有空間信息數(shù)據(jù)特色的數(shù)據(jù)更新機(jī)制,是一種保留歷史數(shù)據(jù)的增量更新機(jī)制。 (6)最后對(duì)整個(gè)系統(tǒng)進(jìn)行了相應(yīng)的測(cè)試,理論結(jié)合實(shí)際,實(shí)驗(yàn)驗(yàn)證理論,便于通過具體的實(shí)驗(yàn)數(shù)據(jù)和實(shí)驗(yàn)方法來改進(jìn)理論。 并在此基礎(chǔ)上做了以下創(chuàng)新: (1)提出了一種根據(jù)文件訪問頻度對(duì)數(shù)據(jù)進(jìn)行區(qū)別冗余的機(jī)制。數(shù)據(jù)區(qū)別冗余是指根據(jù)數(shù)據(jù)的訪問頻度動(dòng)態(tài)地去更改文件的冗余數(shù),以求獲得更好的文件訪問性能和最省的磁盤空間。 (2)提出了一種保留歷史數(shù)據(jù)的空間信息數(shù)據(jù)增量更新機(jī)制。數(shù)據(jù)增量更新機(jī)制是根據(jù)空間信息數(shù)據(jù)特點(diǎn)設(shè)計(jì)的,用戶可以通過對(duì)比分析歷史數(shù)據(jù)和最新數(shù)據(jù)了解空間特征的變化過程。
[Abstract]:The so-called cloud service refers to the use of computer hardware technology, software technology, information security technology, network technology, space information technology, communication technology, virtual technology, cluster technology and storage technology and parallel computing. It combines a large number of resources distributed in the network to unite management and scheduling, and form a large scale. The resource pool provides service to the user in the way of demand and extensibility. There are many categories of cloud services. According to the different service modes of the cloud, it can be divided into three categories: public cloud, private cloud and mixed cloud. According to architecture, it can be divided into three models, SaaS, PaaS and IaaS. In addition, the cloud service can be divided according to user's use. For file cloud, device cloud and application cloud three.
At present, there are many cloud services on the market, but there are few cloud services combined with spatial information. Spatial information refers to information used to describe the spatial geographic location, spatial entity distribution and time and space characteristics. According to incomplete statistics, more than 80% of information is related to spatial information in the information obtained by human beings. Space information is a spatial information. Interest data usually has the characteristics of large amount of data, complex structure, complex structure, strong specialization, wide source and strong real time. There is a shortage of storage and management of spatial information data, lack of fast perception of spatial information data, lack of storage and management mechanism for massive spatial information data, and lack of space for space information data. Information data backup and disaster recovery mechanism. Therefore, it is of great significance to provide cloud services for spatial information.
Based on the current situation of cloud service, this paper analyzes the characteristics of spatial information data and the characteristics and requirements of space information service cloud based on the theoretical system of G/S model and system architecture, based on related topics and projects, and compares and analyzes the mainstream storage technology, parallel computing technology and distributed computing technology, which is stable today. Based on the popular open source distributed file system (HDFS), based on the distributed computing framework MapReduce as a programming model, the overall architecture of space information service cloud and the internal data storage management mechanism are studied.
(1) the characteristics of spatial information data are studied and summarized in four aspects: massive, multi-source, heterogeneous and spatio-temporal attributes.
(2) the specific requirements of space information service cloud are studied, and a variety of storage technologies, including RAID technology, FastDFS file system, MooseFS file system and HDFS, are studied and compared with these requirements, and the parallel computing technology MPI and distributed computing framework MapReduce are compared, and the space information data are selected to fit the space information data. Features: HDFS file system and MapReduce programming framework that meet the cloud needs of spatial information services.
(3) the problems that should be paid attention to during the process of building the cloud service platform for space information service and the specific configuration scheme should be paid attention to.
(4) according to the frequency of user access to the system data and the method of increasing the system log file, the MapReduce application is written to deal with the problem of data difference redundancy, which not only increases the practicability of the system, but also saves the storage cost and management resources of the system.
(5) a data update mechanism with spatial information data characteristics is proposed, which is an incremental updating mechanism for preserving historical data.
(6) at last, the whole system is tested, the theory is combined with the practice, the theory is verified by experiment, and the theory is improved by the concrete experimental data and the experimental method.
And on this basis, the following innovations have been made:
(1) a mechanism of redundancy based on the frequency of file access is proposed. Data difference redundancy refers to dynamically changing the redundant number of files according to the frequency of access to the data in order to obtain better file access performance and the most provincial disk space.
(2) an incremental updating mechanism of spatial information data is proposed. The incremental updating mechanism of the data is designed according to the characteristics of spatial information data. The user can understand the change process of spatial characteristics by comparing and analyzing the historical data and the latest data.
【學(xué)位授予單位】:成都理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類號(hào)】:TP333
【參考文獻(xiàn)】
相關(guān)期刊論文 前2條
1 劉岳峰;地理信息服務(wù)概述[J];地理信息世界;2004年06期
2 孫慶輝;王家耀;鐘大偉;李少梅;;空間信息服務(wù)模式研究[J];武漢大學(xué)學(xué)報(bào)(信息科學(xué)版);2009年03期
相關(guān)博士學(xué)位論文 前3條
1 王興玲;基于XML的地理信息Web服務(wù)研究[D];中國(guó)科學(xué)院研究生院(遙感應(yīng)用研究所);2002年
2 俞曉;空間信息網(wǎng)絡(luò)訪問模式——G/S模式研究[D];成都理工大學(xué);2009年
3 郭曦榕;基于G/S模式的數(shù)字旅游工程及其評(píng)估技術(shù)研究[D];成都理工大學(xué);2010年
相關(guān)碩士學(xué)位論文 前10條
1 杜勇;基于HDFS的云數(shù)據(jù)備份系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[D];吉林大學(xué);2011年
2 李波;基于Hadoop的海量圖象數(shù)據(jù)管理[D];華東師范大學(xué);2011年
3 張洪娜;云計(jì)算平臺(tái)中數(shù)據(jù)存儲(chǔ)與文件管理的研究[D];廣東工業(yè)大學(xué);2011年
4 楊麗婷;基于云計(jì)算數(shù)據(jù)存儲(chǔ)技術(shù)的研究[D];中北大學(xué);2011年
5 陳劍銳;基于Hadoop海量數(shù)據(jù)存儲(chǔ)仿真平臺(tái)的研究與設(shè)計(jì)[D];華南理工大學(xué);2011年
6 吳昊;基于HDFS的分布式文件系統(tǒng)數(shù)據(jù)冗余技術(shù)研究[D];西安電子科技大學(xué);2011年
7 泰冬雪;基于Hadoop的海量小文件處理方法的研究[D];遼寧大學(xué);2011年
8 楊勇;基于DFS的構(gòu)建服務(wù)器集群技術(shù)的研究與實(shí)現(xiàn)[D];成都理工大學(xué);2011年
9 朱珠;基于Hadoop的海量數(shù)據(jù)處理模型研究和應(yīng)用[D];北京郵電大學(xué);2008年
10 張容;LVS負(fù)載均衡技術(shù)在G/S分布式集群中的應(yīng)用[D];成都理工大學(xué);2009年
本文編號(hào):1816235
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1816235.html