HDFS數(shù)據(jù)副本隨需調(diào)整及其放置策略研究
本文選題:云存儲 + 數(shù)據(jù)副本。 參考:《蘭州理工大學(xué)》2013年碩士論文
【摘要】:信息技術(shù)的持續(xù)快速發(fā)展帶來了對數(shù)據(jù)存儲及作用在數(shù)據(jù)集上計算的空前要求,科研機構(gòu)、政府以及企業(yè)都面臨著海量數(shù)據(jù)存儲成本高、數(shù)據(jù)管理困難、計算復(fù)雜度高、容錯率低等難題。為了解決這些問題,云存儲應(yīng)運而生。云存儲正是一個以數(shù)據(jù)為主要資源,為云計算提供底層數(shù)據(jù)存儲的系統(tǒng),它將網(wǎng)絡(luò)上分散的、異構(gòu)的、獨立的、海量的存儲系統(tǒng)組織成一個可靠的、安全的邏輯意義上的整體,進行統(tǒng)一的管理,從而為用戶提供高效的、高可靠的、透明的服務(wù)。 云存儲系統(tǒng)中的數(shù)據(jù)副本技術(shù)是必不可少的數(shù)據(jù)管理技術(shù)。本文基于HDFS云存儲集群,主要研究的數(shù)據(jù)副本技術(shù)包括:數(shù)據(jù)塊大小的確定、數(shù)據(jù)副本創(chuàng)建條件、數(shù)據(jù)副本創(chuàng)建個數(shù)、數(shù)據(jù)副本刪除條件以及確定數(shù)據(jù)副本放置位置。 針對以上所需要研究的內(nèi)容,文章做了以下幾方面的工作:首先,建立文件數(shù)據(jù)塊大小動態(tài)調(diào)整模型、數(shù)據(jù)副本創(chuàng)建模型及刪除模型;其次,建立數(shù)據(jù)副本放置的默認(rèn)模型與動態(tài)模型,提出了層次化的機架節(jié)點選擇算法和數(shù)據(jù)節(jié)點選擇算法(該模型中,數(shù)據(jù)副本個數(shù)可以按照需要動態(tài)調(diào)整)。其中,數(shù)據(jù)塊大小確定策略的優(yōu)劣將直接影響到Map/Reduce任務(wù)數(shù)的分配、文件數(shù)據(jù)塊的管理以及網(wǎng)絡(luò)系統(tǒng)的性能,因此必須結(jié)合環(huán)境特點與用戶需求于一體為文件數(shù)據(jù)分塊;在決定了合適的塊大小后,則需要結(jié)合云存儲系統(tǒng)的特點與用戶需求將文件數(shù)據(jù)寫入集群;同時,云存儲集群系統(tǒng)還需要解決副本冗余度的問題,即應(yīng)該為一個文件數(shù)據(jù)塊創(chuàng)建多少個副本的問題;基于數(shù)據(jù)副本創(chuàng)建條件,必須解決冗余副本的刪除問題,以提高集群系統(tǒng)服務(wù)效能;在放置數(shù)據(jù)副本時,文章以減少并優(yōu)化文件數(shù)據(jù)在HDFS云存儲集群間的傳輸,達到節(jié)省網(wǎng)絡(luò)帶寬和提高HDFS集群系統(tǒng)Map/Reduce計算性能的目的,將數(shù)據(jù)副本放置策略劃分為默認(rèn)數(shù)據(jù)副本放置策略和動態(tài)數(shù)據(jù)副本放置策略。
[Abstract]:The continuous and rapid development of information technology has brought unprecedented requirements for data storage and computing on data sets. Scientific research institutions, governments and enterprises are all faced with high cost of massive data storage, difficult data management, and high computational complexity. Problems such as low fault tolerance. In order to solve these problems, cloud storage came into being. Cloud storage is a system that uses data as the main resource to provide the underlying data storage for cloud computing. It organizes distributed, heterogeneous, independent, massive storage systems on the network into a reliable, secure logical whole. Unified management to provide users with efficient, reliable and transparent services. Data replica technology in cloud storage system is an indispensable data management technology. In this paper, based on HDFS cloud storage cluster, the data replica technology is mainly studied, which includes: data block size determination, data replica creation condition, data replica creation number, data replica deletion condition and data replica location. In view of the above research content, this paper has done the following work: first, establish the file data block size dynamic adjustment model, data replica creation model and delete model; secondly, The default model and dynamic model of data replica placement are established, and the hierarchical node selection algorithm and data node selection algorithm are proposed. In this model, the number of data replicas can be dynamically adjusted according to the need. Among them, the decision strategy of data block size will directly affect the distribution of Map-Reduce task, the management of file data block and the performance of network system. After determining the appropriate block size, it is necessary to write file data to the cluster according to the characteristics of cloud storage system and user requirements, and to solve the problem of replica redundancy in cloud storage cluster system. That is, how many copies should be created for a file data block; based on the conditions for creating data replicas, the problem of deleting redundant replicas must be resolved to improve the service efficiency of the cluster system; when placing data replicas, In order to reduce and optimize the transmission of file data between HDFS cloud storage clusters, the paper achieves the goal of saving network bandwidth and improving the performance of Map-Reduce computing in HDFS cluster system. The data copy placement policy is divided into default data copy placement policy and dynamic data copy placement policy.
【學(xué)位授予單位】:蘭州理工大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:TP333
【參考文獻】
相關(guān)期刊論文 前6條
1 黃河清;宋曉華;曹元大;;異構(gòu)存儲系統(tǒng)中基于能量模型的文件遷移策略[J];北京航空航天大學(xué)學(xué)報;2007年09期
2 陳康;鄭緯民;;云計算:系統(tǒng)實例與研究現(xiàn)狀[J];軟件學(xué)報;2009年05期
3 潘傳中;;基于可靠性與動態(tài)負(fù)載平衡的分布式副本管理策略[J];四川師范大學(xué)學(xué)報(自然科學(xué)版);2009年05期
4 熊潤群;羅軍舟;宋愛波;金嘉暉;;云計算環(huán)境下QoS偏好感知的副本選擇策略[J];通信學(xué)報;2011年07期
5 周延年;朱怡安;;基于組合權(quán)重的嵌入式計算機綜合性能灰色關(guān)聯(lián)評價算法[J];西北工業(yè)大學(xué)學(xué)報;2011年01期
6 付雄;王汝傳;鄧松;;數(shù)據(jù)網(wǎng)格中一種啟發(fā)式副本放置算法[J];系統(tǒng)工程與電子技術(shù);2010年07期
相關(guān)碩士學(xué)位論文 前7條
1 楊曙鋒;分布式并行文件系統(tǒng)的副本管理策略[D];電子科技大學(xué);2003年
2 林松濤;基于Lustre文件系統(tǒng)的并行I/O技術(shù)研究[D];國防科學(xué)技術(shù)大學(xué);2004年
3 李暉;基于日志的機群文件系統(tǒng)高可用關(guān)鍵技術(shù)研究[D];中國科學(xué)院研究生院(計算技術(shù)研究所);2005年
4 田穎;分布式文件系統(tǒng)中的負(fù)載平衡技術(shù)研究[D];中國科學(xué)院研究生院(計算技術(shù)研究所);2003年
5 萬至臻;基于MapReduce模型的并行計算平臺的設(shè)計與實現(xiàn)[D];浙江大學(xué);2008年
6 朱珠;基于Hadoop的海量數(shù)據(jù)處理模型研究和應(yīng)用[D];北京郵電大學(xué);2008年
7 黑繼偉;基于分布式并行文件系統(tǒng)HDFS的副本管理模型[D];吉林大學(xué);2010年
,本文編號:2026874
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2026874.html