面向HDFS的訪問控制與小文件存儲策略的研究與實(shí)現(xiàn)
[Abstract]:Hadoop is designed to store and analyze big data at the beginning of its design. The processing of big data has a great advantage. HDFS (Hadoop distributed File system) as its underlying storage medium has the advantages of low cost and suitable for dealing with large files. But the ability of HDFS access control is weak, although Hadoop can support Kerberos user authentication at present, its cost is large and flexibility is poor. In addition, HDFS has good support for large files, but its support for small files is low, a large number of small files are stored in HDFS, its metadata will occupy a lot of space on the master node, and the number of files in the whole file system will be limited. The efficiency of reading small and medium files in HDFS is not high, reading a large number of small files will affect the IO performance of the primary node. In order to improve the security of Hadoop, the encryption space is introduced, but the encryption algorithm is single, the iterative directory encryption is not supported, and the application level encryption is not provided. Therefore, this paper mainly from access control, small file merging, file encryption three aspects of research, proposed three optimizations: (1) proposed a trust value as the index of access control method, According to the user's access history, the access control ability of HDFS is improved by the feedback access control method. (2) according to the user's access history, association rules are mined, and based on frequent itemsets, the access control ability of HDFS is improved. The small files are stored in HDFS, and the two-level cache strategy is adopted to improve the reading efficiency. (3) the data can be encrypted by unpluggable way, and the data is stored in HDFS in ciphertext to improve the security of the data. The encryption and decryption policies are implemented by Map Reduce oriented and client-oriented, and Input format is customized to support Map reduction. Access control, small file merging and file encryption are implemented in the experimental cluster. The experimental results show that the proposed access control based on trust value has good performance compared with the original HDFS system. Extra time costs little. The small file merging strategy is very necessary, greatly reduces the space occupation of metadata, and has a good cache hit ratio in centralized access mode, and improves the reading efficiency. In the case of non-encrypted XOR-AESS-AES, the test of XOR-AES in the case of client oriented and Map Reduce oriented has some time overhead, which is better than that of AES algorithm, and the time cost is small in the case of Map Reduce oriented. After testing, the strategy proposed in this paper can achieve the desired results.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP333;TP309
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 王全民;張程;趙小桐;雷佳偉;;一種Hadoop小文件存儲優(yōu)化方案[J];計(jì)算機(jī)技術(shù)與發(fā)展;2016年11期
2 陳敏;劉寧;肖樹發(fā);肖興政;鄒玲;;醫(yī)療健康大數(shù)據(jù)應(yīng)用關(guān)鍵問題及對策研究[J];中國數(shù)字醫(yī)學(xué);2016年08期
3 王凱;賈思懿;張強(qiáng);張科;董歡慶;劉振軍;;基于Intel ISA-L的RS-RAID系統(tǒng)的研究與實(shí)現(xiàn)[J];計(jì)算機(jī)工程與應(yīng)用;2016年15期
4 吳輝群;翁霞;王磊;倪曉薇;鄒如意;陳亞蘭;施李麗;蔣葵;董建成;;醫(yī)學(xué)影像大數(shù)據(jù)的存儲與挖掘技術(shù)研究[J];中國數(shù)字醫(yī)學(xué);2016年02期
5 王紹人;杜學(xué)繪;楊智;;面向HDFS的可證明安全的單點(diǎn)登錄協(xié)議[J];計(jì)算機(jī)應(yīng)用研究;2016年07期
6 史文浩;江國華;秦小麟;王勝;;基于用戶信任值的HDFS訪問控制模型研究[J];計(jì)算機(jī)科學(xué)與探索;2016年01期
7 李三淼;李龍澍;;Hadoop中處理小文件的四種方法的性能分析[J];計(jì)算機(jī)工程與應(yīng)用;2016年09期
8 李鐵;燕彩蓉;黃永鋒;宋亞龍;;面向Hadoop分布式文件系統(tǒng)的小文件存取優(yōu)化方法[J];計(jì)算機(jī)應(yīng)用;2014年11期
9 楊彬;;分布式文件系統(tǒng)HDFS處理小文件的優(yōu)化方案[J];軟件;2014年06期
10 宋國峰;梁昌勇;;一種基于用戶行為信任的云安全訪問控制模型[J];中國管理科學(xué);2013年S2期
相關(guān)博士學(xué)位論文 前1條
1 李彭軍;醫(yī)學(xué)影像云服務(wù)平臺基礎(chǔ)架構(gòu)研究與實(shí)踐[D];南方醫(yī)科大學(xué);2011年
,本文編號:2151535
本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2151535.html