基于hadoop的地震數(shù)據(jù)分布式存儲策略的研究
本文關(guān)鍵詞: HADOOP 地震數(shù)據(jù) 分布式 分布式計(jì)算 出處:《東北石油大學(xué)》2014年碩士論文 論文類型:學(xué)位論文
【摘要】:在實(shí)際地震資料的處理時,影響數(shù)據(jù)處理效率的因素有很多,從整體上說,影響地震數(shù)據(jù)處理效率主要分為軟件和硬件兩個方面,也就是訪問方法和訪問環(huán)境的配置。但是由于訪問方法的不斷開發(fā)優(yōu)化和服務(wù)器存儲訪問環(huán)境的更新需求造成了巨大的經(jīng)費(fèi)開銷的同時訪問方法的優(yōu)化也越來越困難。 為了解決訪問方法優(yōu)化開發(fā)的瓶頸和存儲服務(wù)器更新代價兩方面問題,本文通過對地震數(shù)據(jù)存儲特性的研究,基于Hadoop對當(dāng)前大數(shù)據(jù)存儲訪問技術(shù),提出基于Hadoop的地震數(shù)據(jù)分布式存儲策略,并通過該存儲策略優(yōu)化地震數(shù)據(jù)的存儲訪問環(huán)境,提高設(shè)備利用率。本文具體研究內(nèi)容如下: 1.Hadoop的地震數(shù)據(jù)分布式存儲適應(yīng)性研究; 對Hadoop分布式框架的數(shù)據(jù)存儲結(jié)構(gòu)與地震數(shù)據(jù)的數(shù)據(jù)結(jié)構(gòu)、訪問特性等方面進(jìn)行適應(yīng)性研究,同時對地震數(shù)據(jù)分布式存儲所需要考慮的組織結(jié)構(gòu)、集群配置因素進(jìn)行考量。通過Hadoop的數(shù)據(jù)訪問方法與地震數(shù)據(jù)訪問方法的有效結(jié)合,以廉價集群為前提,提出地震數(shù)據(jù)分布式存儲策略的整體框架。 2.地震數(shù)據(jù)分布式存儲的組織策略; 根據(jù)Hadoop集群環(huán)境的特性,,對地震數(shù)據(jù)的分塊大小、數(shù)據(jù)塊分配、數(shù)據(jù)完整性進(jìn)行組織,組織之后對環(huán)境參數(shù)合理配置,使之更高效的存儲在Hadoop的分布式文件系統(tǒng)中。并通過實(shí)驗(yàn)來驗(yàn)證最符合于地震數(shù)據(jù)特性的環(huán)境參數(shù)配置及最優(yōu)的數(shù)據(jù)組織策略。 3.基于Hadoop的地震數(shù)據(jù)存取模塊的設(shè)計(jì); 為了進(jìn)一步驗(yàn)證Hadoop對地震數(shù)據(jù)的分布式計(jì)算的優(yōu)勢,本文將通過對Hadoop編程框架MapReduce和目前地震數(shù)據(jù)存取模塊同時進(jìn)行開發(fā),并將兩種環(huán)境下的存取模塊進(jìn)行對比,通過改變相應(yīng)的環(huán)境參數(shù)來驗(yàn)證Hadoop地震數(shù)據(jù)分布式存儲的高效性,并得出分布式節(jié)點(diǎn)個數(shù)和數(shù)據(jù)大小的不同對數(shù)據(jù)訪問效率的影響。 最后綜合本文的研究內(nèi)容,實(shí)現(xiàn)其各個優(yōu)化技術(shù),提出完整的地震數(shù)據(jù)分布式存儲策略。以此來驗(yàn)證本文提出的相關(guān)優(yōu)化技術(shù)和方法的可行性和有效性。
[Abstract]:In the actual seismic data processing, there are many factors that affect the data processing efficiency. On the whole, the seismic data processing efficiency is mainly divided into two aspects: software and hardware. But due to the continuous development and optimization of access methods and the updating requirements of the server storage access environment, it is becoming more and more difficult to optimize the access methods because of the huge cost of the access methods and the configuration of the access environment. In order to solve the bottleneck of access method optimization development and the cost of storage server update, this paper studied the characteristics of seismic data storage, based on Hadoop to big data storage access technology. This paper proposes a distributed storage strategy for seismic data based on Hadoop, and optimizes the storage and access environment of seismic data through the strategy to improve the utilization of equipment. The specific contents of this paper are as follows:. 1. Research on Hadoop's adaptability to distributed storage of seismic data; The adaptability of the data storage structure of the Hadoop distributed framework and the data structure and access characteristics of the seismic data are studied. At the same time, the organizational structure that should be considered in the distributed storage of seismic data is also discussed. Through the effective combination of data access method of Hadoop and seismic data access method, the overall framework of distributed storage strategy of seismic data is put forward based on the premise of cheap cluster. 2.Organizing strategy of distributed storage of seismic data; According to the characteristics of Hadoop cluster environment, the block size, data block distribution and data integrity of seismic data are organized, and the environmental parameters are reasonably configured after organizing. It can be stored in the distributed file system of Hadoop more efficiently, and the best configuration of environment parameters and the optimal data organization strategy are verified by experiments. 3. Design of seismic data access module based on Hadoop; In order to further verify the advantages of Hadoop in distributed computing of seismic data, this paper will develop the Hadoop programming framework MapReduce and the current seismic data access module at the same time, and compare the access modules in the two environments. The efficiency of distributed storage of Hadoop seismic data is verified by changing the corresponding environmental parameters, and the effect of the number of distributed nodes and the size of data on the efficiency of data access is obtained. Finally, by synthesizing the research contents of this paper, the optimization techniques are realized, and a complete distributed storage strategy of seismic data is proposed to verify the feasibility and effectiveness of the related optimization techniques and methods proposed in this paper.
【學(xué)位授予單位】:東北石油大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TP333
【參考文獻(xiàn)】
相關(guān)期刊論文 前7條
1 崔麗美,謝傳節(jié),楊聯(lián)安,張蕾;基于XML Schema地球系統(tǒng)科學(xué)數(shù)據(jù)的元數(shù)據(jù)擴(kuò)展機(jī)制[J];測繪學(xué)報(bào);2005年03期
2 任燕舞;;多操作系統(tǒng)平臺間的數(shù)據(jù)共享[J];福建電腦;2009年03期
3 邵家元;;地震勘探技術(shù)的發(fā)展及主要物探技術(shù)的比較[J];低碳世界;2013年03期
4 張成陽,穆志純,孫德輝;Internet魯棒性與HOT模型初探[J];計(jì)算機(jī)應(yīng)用;2004年02期
5 詹玲;馬駿;陳伯江;陳維梁;呂睿;;分布式I/O日志回放系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[J];計(jì)算機(jī)工程與應(yīng)用;2010年36期
6 陳龍;王國胤;;一種細(xì)粒度數(shù)據(jù)完整性檢驗(yàn)方法[J];軟件學(xué)報(bào);2009年04期
7 曹孟起;;地震數(shù)據(jù)處理技術(shù)進(jìn)展[J];石油科技論壇;2008年05期
相關(guān)博士學(xué)位論文 前2條
1 安寶宇;云存儲中數(shù)據(jù)完整性保護(hù)關(guān)鍵技術(shù)研究[D];北京郵電大學(xué);2012年
2 韓晶;大數(shù)據(jù)服務(wù)若干關(guān)鍵技術(shù)研究[D];北京郵電大學(xué);2013年
本文編號:1510391
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1510391.html