天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于HDFS分布式并行文件系統(tǒng)副本策略研究

發(fā)布時(shí)間:2018-08-02 12:10
【摘要】:近年來(lái),隨著科學(xué)技術(shù)的進(jìn)一步發(fā)展,全球數(shù)據(jù)量出現(xiàn)高速增長(zhǎng),特別是更加注重用戶(hù)的交互作用的Web2.0的出現(xiàn),改變了過(guò)去用戶(hù)只能作為互聯(lián)網(wǎng)讀者的角色,用戶(hù)成為了互聯(lián)網(wǎng)內(nèi)容的創(chuàng)作者。在這樣的海量信息環(huán)境中,傳統(tǒng)的存儲(chǔ)系統(tǒng)已經(jīng)不能滿(mǎn)足信息量高速增長(zhǎng)的要求,在容量和性能的要求上存在瓶頸,諸如硬盤(pán)數(shù)量、服務(wù)器數(shù)量等的限制。 HDFS(Hadoop Distributed File System)是不同于傳統(tǒng)分布式并行文件系統(tǒng)的,運(yùn)行于廉價(jià)的機(jī)器上的,具有高吞吐量、高容錯(cuò)性、高可靠性的新型分布式文件系統(tǒng)。具有數(shù)據(jù)分布存儲(chǔ)與管理功能,并提供高性能的數(shù)據(jù)訪問(wèn)與交互。 在分布式并行文件系統(tǒng)HDFS中,副本是其重要的組成部分,副本技術(shù)更是協(xié)調(diào)互聯(lián)網(wǎng)中各個(gè)節(jié)點(diǎn)資源完成高效且工作量較大的任務(wù),實(shí)現(xiàn)這一任務(wù)的途徑即通過(guò)副本放置、副本選擇、副本調(diào)整等方式提高數(shù)據(jù)在各節(jié)點(diǎn)間的有效傳輸。 本文首先對(duì)副本管理策略的研究現(xiàn)狀作了分析,總結(jié)了前輩們?cè)谠擃I(lǐng)域已有的研究成果以及它們的局限性;在此基礎(chǔ)上對(duì)HDFS系統(tǒng)架構(gòu)及其讀寫(xiě)機(jī)制等關(guān)鍵技術(shù)進(jìn)行深入分析和闡述,并在此基礎(chǔ)上建立HDFS動(dòng)態(tài)副本管理模型,從副本放置和副本刪除兩個(gè)方面展開(kāi)了論述。然后,根據(jù)副本放置策略的改進(jìn)思想進(jìn)行算法的設(shè)計(jì),提出了基于距離和負(fù)載信息的副本放置策略,引進(jìn)平衡因子調(diào)節(jié)距離和負(fù)載的比重滿(mǎn)足不同用戶(hù)對(duì)系統(tǒng)的要求;同時(shí),根據(jù)副本調(diào)整階段的需求,改進(jìn)副本刪除策略,引入副本評(píng)價(jià)函數(shù),提出基于價(jià)值評(píng)估的副本刪除策略;最后,通過(guò)仿真模擬實(shí)驗(yàn),對(duì)本文提出的副本策略進(jìn)行有效性驗(yàn)證,并與HDFS默認(rèn)副本策略進(jìn)行對(duì)比分析。 本文的主要貢獻(xiàn)在于: 1)分析了HDFS分布式并行文件系統(tǒng)與傳統(tǒng)分布式系統(tǒng)的區(qū)別,重點(diǎn)與GFS進(jìn)行了對(duì)比分析,分析兩者的設(shè)計(jì)思想和原則,比較副本管理策略的異同,說(shuō)明HDFS是GFS的簡(jiǎn)化設(shè)計(jì),具有更加靈活的操作性。 2)提出了一種基于距離和負(fù)載信息的副本放置策略。該策略改變了HDFS默認(rèn)副本放置策略的隨機(jī)存儲(chǔ)算法,綜合考慮了副本大小、傳輸帶寬以及節(jié)點(diǎn)負(fù)載三方面影響因素,計(jì)算出節(jié)點(diǎn)的效用值,優(yōu)先選擇效用值大的節(jié)點(diǎn)存儲(chǔ)數(shù)據(jù)塊,并引入平衡因子,滿(mǎn)足不同用戶(hù)對(duì)系統(tǒng)性能的要求。最后模擬實(shí)驗(yàn)驗(yàn)證了本文算法在負(fù)載均衡上較HDFS默認(rèn)放置策略具有明顯的優(yōu)越性。 3)提出了一種基于價(jià)值評(píng)估的副本刪除策略。當(dāng)有新的副本寫(xiě)入請(qǐng)求時(shí),Namenode節(jié)點(diǎn)隨機(jī)獲取一組Datanode,選擇一個(gè)節(jié)點(diǎn)寫(xiě)入數(shù)據(jù)。若被選擇的節(jié)點(diǎn)已有副本數(shù)量太多,負(fù)載太重,性能就不能有效發(fā)揮;HDFS默認(rèn)副本調(diào)整策略沒(méi)有考慮到這一點(diǎn),改進(jìn)的策略通過(guò)價(jià)值評(píng)估函數(shù)計(jì)算副本的價(jià)值,并進(jìn)行排序,當(dāng)節(jié)點(diǎn)負(fù)載過(guò)大時(shí),刪除價(jià)值最小的副本,以此來(lái)釋放節(jié)點(diǎn)空間,充分發(fā)揮節(jié)點(diǎn)效用,實(shí)驗(yàn)表明,在大文件寫(xiě)入測(cè)試中,本文策略較HDFS默認(rèn)策略具備更高的性能。
[Abstract]:In recent years, with the further development of science and technology, the rapid growth of global data, especially the emergence of Web2.0 which pays more attention to the interaction of users, has changed the role of the past users only as the Internet reader, and the user has become the creator of the Internet content. In such a mass information environment, the traditional storage system It can not satisfy the requirement of high-speed growth of information, and there are bottlenecks in capacity and performance, such as the number of hard disks, the number of servers and so on.
HDFS (Hadoop Distributed File System) is a new distributed file system with high throughput, high fault tolerance and high reliability, which is different from the traditional distributed parallel file system. It has high throughput, high fault tolerance and high reliability. It has the function of data distribution and management, and provides high performance data access and interaction.
In the distributed parallel file system (HDFS), replica is an important part of the system. Replica technology is a task to coordinate the efficient and heavy workload of each node resource in the Internet. The way to achieve this task is to improve the effective transmission of data among the nodes through replica placement, copy selection, copy adjustment and so on.
This paper first analyzes the research status of copy management strategy, summarizes the research achievements and limitations of predecessors in this field. On this basis, it analyzes and expounds the key technologies such as HDFS system architecture and its reading and writing mechanism, and builds a HDFS dynamic copy management model on this basis. Two aspects are discussed. Then, the algorithm is designed based on the improved idea of replica placement strategy. A copy placement strategy based on distance and load information is proposed. The balance factor is introduced to adjust the proportion of the distance and load to meet the requirements of different users to the system; meanwhile, according to the requirement of the replica adjustment phase. The copy deleting strategy is improved, the replica evaluation function is introduced and the copy deletion strategy based on the value evaluation is proposed. Finally, the validity of the replica strategy is verified by the simulation experiment, and the HDFS default copy strategy is compared and analyzed.
The main contributions of this article are as follows:
1) the difference between the HDFS distributed parallel file system and the traditional distributed system is analyzed. The emphasis is compared with the GFS, the design ideas and principles of the two are analyzed, and the similarities and differences of the copy management strategy are compared. It shows that HDFS is a simplified design of GFS and has a more flexible operation.
2) a copy placement strategy based on distance and load information is proposed. This strategy changes the random storage algorithm of the HDFS default copy placement strategy. It considers the size of the replica, the transmission bandwidth and the three aspects of the node load, and calculates the utility value of the node. The balance factor can meet the requirements of different users for the performance of the system. Finally, the simulation experiment shows that the proposed algorithm is superior to the HDFS default placement strategy in load balancing.
3) a copy deletion strategy based on value evaluation is proposed. When a new copy is written to the request, the Namenode node randomly acquires a set of Datanode and selects one node to write the data. If the number of selected nodes has too many copies and the load is too heavy, the performance will not be effective; the HDFS default copy adjustment strategy does not take this into account. One point is that the improved strategy calculates the value of the copy through the value evaluation function and makes the sorting. When the node is overloaded, the minimum value of the copy is deleted to release the node space and give full play to the node utility. The experiment shows that the strategy has higher performance than the HDFS default strategy in the large file writing test.
【學(xué)位授予單位】:浙江師范大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類(lèi)號(hào)】:TP316.4;TP333

【參考文獻(xiàn)】

相關(guān)期刊論文 前7條

1 林偉偉;;一種改進(jìn)的Hadoop數(shù)據(jù)放置策略[J];華南理工大學(xué)學(xué)報(bào)(自然科學(xué)版);2012年01期

2 李麗英;唐卓;李仁發(fā);;基于LATE的Hadoop數(shù)據(jù)局部性改進(jìn)調(diào)度算法[J];計(jì)算機(jī)科學(xué);2011年11期

3 陳全;鄧倩妮;;云計(jì)算及其關(guān)鍵技術(shù)[J];計(jì)算機(jī)應(yīng)用;2009年09期

4 王龍;萬(wàn)振凱;;基于服務(wù)架構(gòu)的云計(jì)算研究及其實(shí)現(xiàn)[J];計(jì)算機(jī)與數(shù)字工程;2009年07期

5 利業(yè)韃;林偉偉;;一種Hadoop數(shù)據(jù)復(fù)制優(yōu)化方法[J];計(jì)算機(jī)工程與應(yīng)用;2012年21期

6 張靜;謝曉蘭;聶紹輝;;網(wǎng)格平臺(tái)下基于GT5的數(shù)據(jù)管理組件RLS的研究[J];軟件導(dǎo)刊;2010年08期

7 王慧娟;胡峰松;陳燦;;數(shù)據(jù)網(wǎng)格環(huán)境下副本淘汰策略的研究[J];計(jì)算機(jī)工程與設(shè)計(jì);2010年19期



本文編號(hào):2159391

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2159391.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶(hù)a31b5***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com
亚洲中文字幕亲近伦片| 国产亚洲欧美一区二区| 粉嫩内射av一区二区| 人妻人妻人人妻人人澡| 麻豆国产精品一区二区| 一区二区日韩欧美精品| 欧洲亚洲精品自拍偷拍| 亚洲免费观看一区二区三区| 日本精品理论在线观看| 日韩av亚洲一区二区三区| 国产精品午夜福利免费阅读| 国产又色又爽又黄又大| 亚洲欧美日韩国产成人| 日韩欧美综合中文字幕| 免费黄片视频美女一区| 日韩人妻欧美一区二区久久| 国产乱久久亚洲国产精品| 日本和亚洲的香蕉视频| 亚洲中文字幕人妻av| 午夜国产成人福利视频| 大香蕉久久精品一区二区字幕| 免费播放一区二区三区四区| 日本大学生精油按摩在线观看| 成在线人免费视频一区二区| 日本理论片午夜在线观看| 欧美一级日韩中文字幕| 台湾综合熟女一区二区| 国产成人精品综合久久久看| 不卡视频免费一区二区三区| 中文字幕一区二区熟女| 日韩性生活片免费观看| 欧美精品女同一区二区| 高中女厕偷拍一区二区三区| 精品丝袜一区二区三区性色| 最近最新中文字幕免费| 国产精品国产亚洲看不卡| 日韩精品免费一区三区| 国产亚洲欧美一区二区| 日韩黄色大片免费在线| 夫妻性生活一级黄色录像| 色综合久久超碰色婷婷|