Hadoop平臺存儲策略的研究與優(yōu)化
發(fā)布時(shí)間:2018-04-02 16:33
本文選題:云計(jì)算 切入點(diǎn):Hadoop 出處:《北京交通大學(xué)》2012年碩士論文
【摘要】:隨著經(jīng)濟(jì)、社會以及科學(xué)技術(shù)的發(fā)展,數(shù)字信息正在經(jīng)歷爆炸式的增長。信息化和互聯(lián)網(wǎng)的發(fā)展以及廉價(jià)的存儲設(shè)備的出現(xiàn),為海量信息存儲提供了動力和物理基礎(chǔ)。數(shù)據(jù)量比較小的時(shí)候,存儲和備份數(shù)據(jù)比較簡單,隨著數(shù)據(jù)量達(dá)到TB甚至PB級別,存儲和備份如此龐大的數(shù)據(jù)成為一個(gè)棘手的問題,而且人們對數(shù)據(jù)的存儲效率和安全性的要求也在不斷的提高。如何高效的存儲和讀取數(shù)據(jù)成為人們關(guān)注的重點(diǎn),云計(jì)算是目前比較成熟的方案,是對數(shù)據(jù)存儲和數(shù)據(jù)安全的一個(gè)有效解決辦法,能夠提高數(shù)據(jù)的安全性和存儲速度。Hadoop是云計(jì)算里面比較流行的框架,具體高可靠性、高效性、高擴(kuò)展性和高容錯(cuò)性的優(yōu)勢。而且它是開源框架,非常適合科研和應(yīng)用,所以本文選擇Hadoop框架作為云計(jì)算的研究對象。 基于如何高效存儲海量數(shù)據(jù)的問題,本文在分析Hadoop的HDFS(Hadoop Distributed File System)原理和存儲策略基礎(chǔ)上,結(jié)合實(shí)際應(yīng)用Hadoop平臺遇到的問題,分析其HDFS文件系統(tǒng)數(shù)據(jù)存儲策略的局限和不足,提出了HDFS分布式文件系統(tǒng)的優(yōu)化存儲策略DIFT(Dstat Iostat Free Top)。DIFT存儲策略利用更完善的數(shù)據(jù)節(jié)點(diǎn)的狀態(tài)信息作為策略依據(jù),能夠提高集群的磁盤和網(wǎng)絡(luò)帶寬的利用率,減少瓶頸出現(xiàn)的可能性,提高了系統(tǒng)性能,使集群具有更好的負(fù)載均衡和用戶體驗(yàn)。 本文主要研究內(nèi)容是:首先對Hadoop的HDFS模型的原理研究和分析,從控制節(jié)點(diǎn)、數(shù)據(jù)節(jié)點(diǎn)、文件塊的數(shù)據(jù)結(jié)構(gòu)以及接口、類、方法的調(diào)用關(guān)系方面詳細(xì)分析和研究,分析HDFS的運(yùn)行原理和功能的實(shí)現(xiàn)方法。其次從數(shù)據(jù)結(jié)構(gòu)、狀態(tài)信息、心跳協(xié)議等方面研究和設(shè)計(jì)DIFT存儲策略的實(shí)現(xiàn),最后編譯含有DIFT存儲策略的Hadoop代碼,把DIFT存儲策略應(yīng)用到Hadoop集群上,實(shí)驗(yàn)驗(yàn)證和測試策略的效果。DIFT存儲策略具有可配置的特性,設(shè)計(jì)時(shí)充分考慮用戶實(shí)際情況的特殊性,用戶可以根據(jù)自己實(shí)際需求設(shè)置符合實(shí)際應(yīng)用的策略配置。實(shí)驗(yàn)證明,DIFT存儲策略提高了Hadoop的HDFS分布式文件系統(tǒng)的存儲效率,使得平臺能夠高效的處理海量數(shù)據(jù)的存儲。 HDFS分布式文件系統(tǒng)運(yùn)行在廉價(jià)的機(jī)器搭建穩(wěn)定的Hadoop云平臺之上,同時(shí)配置高效的DIFT存儲策略,可以很好的滿足實(shí)際應(yīng)用的需求,完全可以作為企業(yè)和學(xué)校的數(shù)據(jù)中心的平臺。同時(shí)具有可配置的存儲策略的優(yōu)化,直接配置符合實(shí)際應(yīng)用的策略和閾值即可,減少了企業(yè)和學(xué)校開發(fā)的周期。
[Abstract]:With the development of economy, society and science and technology, digital information is experiencing explosive growth.The development of information and Internet and the emergence of cheap storage devices provide the power and physical basis for mass information storage.When the amount of data is small, it is easier to store and back up data. With the amount of data reaching TB or PB level, storing and backing up such huge data becomes a thorny problem.Moreover, the requirements of data storage efficiency and security are also increasing.How to store and read data efficiently has become the focus of attention. Cloud computing is a mature solution, which is an effective solution to data storage and data security.Hadoop, which can improve data security and storage speed, is a popular framework in cloud computing, which has the advantages of high reliability, high efficiency, high scalability and high fault tolerance.And it is open source framework, very suitable for scientific research and application, so this paper chooses Hadoop framework as the research object of cloud computing.Based on the problem of how to store mass data efficiently, based on the analysis of Hadoop's HDFS(Hadoop Distributed File system principle and storage strategy, combined with the problems encountered in the practical application of Hadoop platform, this paper analyzes the limitations and shortcomings of its HDFS file system data storage strategy.This paper proposes an optimized storage strategy for HDFS distributed file system, DIFT(Dstat Iostat Free Top).DIFT storage strategy, which can improve the utilization of disk and network bandwidth of cluster by using the state information of more perfect data nodes as the policy basis.It reduces the possibility of bottleneck, improves system performance, and makes cluster have better load balance and user experience.The main contents of this paper are as follows: firstly, the principle of Hadoop's HDFS model is studied and analyzed in detail from the aspects of control node, data node, file block data structure, interface, class and method.The operation principle and function realization method of HDFS are analyzed.Secondly, the implementation of DIFT storage strategy is studied and designed from the aspects of data structure, state information, heartbeat protocol and so on. Finally, the Hadoop code with DIFT storage strategy is compiled, and the DIFT storage strategy is applied to Hadoop cluster.The effect of the experimental verification and test strategy .DIFT storage policy has configurable characteristics, the design fully takes into account the particularity of the user's actual situation, the user can set up the policy configuration according to their actual needs according to the actual application.Experimental results show that the DIFT storage strategy improves the storage efficiency of Hadoop's HDFS distributed file system and enables the platform to efficiently process the storage of massive data.HDFS distributed file system runs on cheap machines to build stable Hadoop cloud platform, and configure efficient DIFT storage strategy, which can meet the needs of practical applications, and can be used as a data center platform for enterprises and schools.At the same time, with the optimization of configurable storage strategy, the direct configuration can meet the practical application strategy and threshold value, which reduces the cycle of enterprise and school development.
【學(xué)位授予單位】:北京交通大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2012
【分類號】:TP333
【引證文獻(xiàn)】
相關(guān)碩士學(xué)位論文 前2條
1 董其文;基于HDFS的小文件存儲方法的研究[D];大連海事大學(xué);2013年
2 楊浩;云GIS空間數(shù)據(jù)存儲管理和共享研究[D];中國地質(zhì)大學(xué)(北京);2013年
,本文編號:1701224
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1701224.html
最近更新
教材專著