天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 計(jì)算機(jī)論文 >

基于HDFS的海量小文件存儲系統(tǒng)設(shè)計(jì)與實(shí)現(xiàn)

發(fā)布時(shí)間:2018-06-10 00:19

  本文選題:海量小文件存儲 + 分布式文件系統(tǒng); 參考:《國防科學(xué)技術(shù)大學(xué)》2012年碩士論文


【摘要】:近年來,企業(yè)和個(gè)人數(shù)據(jù)都呈現(xiàn)爆炸性增長的趨勢。谷歌首席執(zhí)行官EricSchmidt表示,現(xiàn)在全球每兩天所創(chuàng)造的數(shù)據(jù)量等同于從人類文明至2003年間產(chǎn)生的數(shù)據(jù)量的總和。如何存儲海量的數(shù)據(jù),成為當(dāng)前存儲系統(tǒng)所面臨的巨大挑戰(zhàn)。傳統(tǒng)集中存儲方式已經(jīng)滿足不了數(shù)據(jù)存儲的需求,,于是出現(xiàn)了用于大規(guī)模數(shù)據(jù)存儲的分布式文件系統(tǒng),如Google File System(GFS)、Hadoop File System(HDFS)、PVFS、Luster等。 這些分布式文件系統(tǒng)具有良好的可擴(kuò)展性和容錯(cuò)特性,能夠滿足海量數(shù)據(jù)存儲的需求。但是在很多應(yīng)用場合除了要求支持海量大文件的存儲,還需要支持海量小文件的存儲。雖然GFS、HDFS等分布式文件系統(tǒng)能夠滿足大文件的高效存儲,但在存儲海量小文件時(shí),效率卻很低。針對此問題,工業(yè)界和學(xué)術(shù)界提出了很多方法,但普遍存在性能低,系統(tǒng)可靠性不高,不能高效存儲小文件元數(shù)據(jù)等問題。針對這些挑戰(zhàn),本文設(shè)計(jì)實(shí)現(xiàn)了一種基于HDFS的海量小文件存儲系統(tǒng)。 該系統(tǒng)的主要設(shè)計(jì)思想是,在HDFS現(xiàn)有的目錄樹結(jié)構(gòu)下,將一個(gè)文件夾內(nèi)的小文件,打包成一個(gè)大文件進(jìn)行存儲,該文件稱為小文件數(shù)據(jù)文件。同時(shí)生成小文件索引,記錄小文件在對應(yīng)數(shù)據(jù)文件中的位置。 本文設(shè)計(jì)和實(shí)現(xiàn)的基于HDFS的海量小文件存儲系統(tǒng)是可擴(kuò)展、高容錯(cuò)、分布式的海量小文件存儲集群系統(tǒng)。本文提出小文件聚合存儲技術(shù)通過將小文件數(shù)據(jù)存儲在HDFS數(shù)據(jù)文件中,實(shí)現(xiàn)數(shù)據(jù)的分布式存儲和容錯(cuò);同時(shí)提出小文件分布索引管理技術(shù)將索引分布到各個(gè)數(shù)據(jù)節(jié)點(diǎn)管理,解決了單一元數(shù)據(jù)節(jié)點(diǎn)在存儲海量小文件成為瓶頸的缺點(diǎn);設(shè)計(jì)的海量小文件存儲系統(tǒng)索引容錯(cuò)機(jī)制通過對索引進(jìn)行容錯(cuò),降低小文件丟失的風(fēng)險(xiǎn);通過在單個(gè)目錄下創(chuàng)建多個(gè)多數(shù)據(jù)文件,解決訪問同一目錄下小文件沖突的問題。在以上基礎(chǔ)上,系統(tǒng)在客戶端緩存用戶常用到的小文件索引位置及數(shù)據(jù)文件流的信息,提高系統(tǒng)的文件訪問的效率。 通過實(shí)驗(yàn)表明,該系統(tǒng)小文件讀寫延遲、吞吐率與不增加小文件支持的原生HDFS相比有了很大的提高。并且,該系統(tǒng)能夠有效解決海量小文件存儲元數(shù)據(jù)過于龐大的問題,且通過索引容錯(cuò)機(jī)制,提高了該系統(tǒng)的可靠性。
[Abstract]:In recent years, both corporate and personal data have shown an explosive growth trend. Google CEO Eric Schmidt said the amount of data created every two days in the world is now equivalent to the amount of data generated between human civilization and 2003. How to store huge amounts of data has become a great challenge to the current storage system. The traditional centralized storage method can no longer meet the requirement of data storage, so distributed file systems for large-scale data storage, such as Google File system / GFSU / Hadoop File system HDFSU / PVFS Luster, etc., have good extensibility and fault tolerance. It can meet the demand of massive data storage. However, in many applications, it is necessary to support the storage of large files as well as large files. Although distributed file systems such as GFSU HDFS can satisfy the efficient storage of large files, the efficiency of storing large numbers of small files is very low. In order to solve this problem, many methods have been put forward by industry and academic circles. However, there are many problems such as low performance, low reliability of system and low efficient storage of small file metadata. Aiming at these challenges, this paper designs and implements a large amount of small file storage system based on HDFS. The main idea of this system is that, under the existing directory tree structure of HDFS, a small file in a folder is designed. Packaged into a large file for storage, this file is called a small file data file. At the same time, the index of small files is generated, and the location of small files in the corresponding data files is recorded. This paper designs and implements a large amount of small file storage system based on HDFS, which is an extensible, highly fault-tolerant and distributed large size small file storage cluster system. In this paper, we propose a small file aggregation storage technology to realize distributed data storage and fault tolerance by storing small file data in HDFS data file, at the same time, we propose a small file distributed index management technology to distribute the index to each data node management. It solves the problem that the single metadata node becomes the bottleneck in storing the large amount of small files, and the fault-tolerant mechanism of the index of the mass small file storage system can reduce the risk of small file loss by fault-tolerant of the index. By creating multiple data files in a single directory, the problem of accessing small files in the same directory is solved. On the basis of the above, the system caches the information of small file index position and data file flow, which is commonly used by users in the client side, and improves the efficiency of file access of the system. The experiment shows that the system has delayed reading and writing of small files. Throughput is much higher than native HDFS without small file support. Moreover, the system can effectively solve the problem that the large amount of metadata stored in small files is too large, and the reliability of the system is improved by index fault-tolerant mechanism.
【學(xué)位授予單位】:國防科學(xué)技術(shù)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2012
【分類號】:TP333

【參考文獻(xiàn)】

相關(guān)期刊論文 前2條

1 楊德志,黃華,張建剛,許魯;大容量、高性能、高擴(kuò)展能力的藍(lán)鯨分布式文件系統(tǒng)[J];計(jì)算機(jī)研究與發(fā)展;2005年06期

2 余思;桂小林;黃汝維;莊威;;一種提高云存儲中小文件存儲效率的方案[J];西安交通大學(xué)學(xué)報(bào);2011年06期



本文編號:2001333

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2001333.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶3cc37***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請E-mail郵箱bigeng88@qq.com
日韩成人高清免费在线| 激情视频在线视频在线视频 | 色婷婷亚洲精品综合网| 欧美一级不卡视频在线观看| 爱草草在线观看免费视频| 日韩精品亚洲精品国产精品| 欧美日韩在线第一页日韩| 精品精品国产欧美在线| 欧美日韩国产一级91| 老熟妇2久久国内精品| 日本免费熟女一区二区三区 | 香港国产三级久久精品三级| 久热这里只有精品九九| 久久国产亚洲精品成人| 丰满熟女少妇一区二区三区| 日韩一区二区三区久久| 国产精品亚洲综合天堂夜夜| 九九热视频网在线观看| 欧美同性视频免费观看| 国内自拍偷拍福利视频| 日本不卡一区视频欧美| 欧美一区二区三区视频区| 在线日韩欧美国产自拍| 亚洲国产丝袜一区二区三区四| 国产毛片av一区二区三区小说| 隔壁的日本人妻中文字幕版| 国产精品免费精品一区二区| 婷婷色香五月综合激激情| 成人午夜在线视频观看| 日本一二三区不卡免费| 好吊视频有精品永久免费| 国产一二三区不卡视频| 91插插插外国一区二区| 国产一区二区熟女精品免费| 日韩国产亚洲欧美激情| 欧美在线观看视频免费不卡| 欧美日韩亚洲精品在线观看| 国产传媒免费观看视频| 夫妻性生活一级黄色录像| 日本视频在线观看不卡| 国产精品不卡高清在线观看|