Hadoop中小文件處理技術(shù)的研究與優(yōu)化
[Abstract]:With the rapid development of Internet, the traditional storage methods can not meet the current needs of mass data access, the storage and processing of mass data has become a new research topic. Distributed computing platform Hadoop has been widely used in cloud computing due to its high reliability, easy expansion and high fault tolerance. Because Hadoop processes files in streaming data access mode, it is also designed to store large files. As a result, Hadoop performs well in processing large files and low storage efficiency in processing small files. In order to solve this problem, this paper analyzes some research and improvement schemes made by predecessors, and finds out its advantages and disadvantages by studying other schemes, and makes corresponding improvements on this basis. The design scheme of this paper is to add an independent small file processing module on the basis of the original distributed file system, through the small file processing module to merge the small files, and to establish the file index. And through the file cache pre-fetching and transmission to the HDFS for data processing. This architecture enables the HDFS system to process small files without affecting the writing or reading of large files or merged small files, so as to improve the storage access efficiency of the system. The scheme of combining and indexing small files in this paper is improved on the basis of HAR. Name the merged file by creating a small file over a period of time. In addition, according to the name and extension of the small file, the Trie tree index of the small file to the specific data block and the address information in the data block is established, and the corresponding index is partitioned according to the corresponding extension, thus the two-level index mechanism is established. Placed in the small file processing module to speed up the system small and medium-sized file retrieval performance. The pre-fetching of the file is based on the metadata and index information of the file and the pre-fetching record of the file in the cache pool of the small file processing module for index prefetching and the pre-fetching of the related files. In this paper, the implementation of the optimization scheme in Hadoop cluster is given, including the implementation of small file merging MapReduce custom input slicing, the establishment of two-level index and other related algorithms. In addition, the performance evaluation index is set up to quantitatively analyze the memory efficiency and access efficiency of small files. Finally, the performance of the small file optimization scheme, the HAR scheme and the original HDFS scheme are compared by experiments. The experimental results show that the optimization scheme of small file processing proposed in this paper is superior to the original HDFS scheme and the HAR scheme in terms of memory usage efficiency and access efficiency.
【學(xué)位授予單位】:河北大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2016
【分類號】:TP333
【參考文獻(xiàn)】
相關(guān)期刊論文 前9條
1 李旭;李長云;張清清;胡淑新;周玲芳;;Hadoop中處理海量小文件的方法[J];計算機(jī)系統(tǒng)應(yīng)用;2015年11期
2 尹穎;林慶;林涵陽;;HDFS中高效存儲小文件的方法[J];計算機(jī)工程與設(shè)計;2015年02期
3 左大鵬;徐薇;;基于Hadoop處理小文件的優(yōu)化策略[J];軟件;2015年02期
4 黃山;王波濤;王國仁;于戈;李佳佳;;MapReduce優(yōu)化技術(shù)綜述[J];計算機(jī)科學(xué)與探索;2013年10期
5 付松齡;廖湘科;黃辰林;王蕾;李姍姍;;FlatLFS:一種面向海量小文件處理優(yōu)化的輕量級文件系統(tǒng)[J];國防科技大學(xué)學(xué)報;2013年02期
6 王鈴惠;李小勇;張軼彬;;海量小文件存儲文件系統(tǒng)研究綜述[J];計算機(jī)應(yīng)用與軟件;2012年08期
7 趙曉永;楊揚;孫莉莉;陳宇;;基于Hadoop的海量MP3文件存儲架構(gòu)[J];計算機(jī)應(yīng)用;2012年06期
8 陳劍;龔發(fā)根;;一種優(yōu)化分布式文件系統(tǒng)的文件合并策略[J];計算機(jī)應(yīng)用;2011年S2期
9 汪志莉;沈富可;;一種基于哈希表和Trie樹的快速內(nèi)容路由查找算法[J];計算機(jī)應(yīng)用與軟件;2009年10期
相關(guān)碩士學(xué)位論文 前7條
1 左大鵬;Hadoop小文件存儲管理的研究與實現(xiàn)[D];北京交通大學(xué);2015年
2 鄭麗潔;小文本語料庫在Hadoop平臺上的存儲策略研究[D];華中師范大學(xué);2014年
3 張波;HDFS下文件存儲研究與優(yōu)化[D];廣東工業(yè)大學(xué);2013年
4 高薊超;Hadoop平臺存儲策略的研究與優(yōu)化[D];北京交通大學(xué);2012年
5 蔡睿誠;基于HDFS的小文件處理與相關(guān)MapReduce計算模型性能的優(yōu)化與改進(jìn)[D];吉林大學(xué);2012年
6 曹風(fēng)兵;基于Hadoop的云計算模型研究與應(yīng)用[D];重慶大學(xué);2011年
7 江柳;HDFS下小文件存儲優(yōu)化相關(guān)技術(shù)研究[D];北京郵電大學(xué);2011年
,本文編號:2469177
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2469177.html