天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 軟件論文 >

面向HDFS的訪問控制與小文件存儲策略的研究與實(shí)現(xiàn)

發(fā)布時(shí)間:2018-07-28 20:49
【摘要】:Hadoop設(shè)計(jì)初期是為了對大數(shù)據(jù)進(jìn)行存儲及分析,對大數(shù)據(jù)的處理有較大優(yōu)勢,HDFS(Hadoop分布式文件系統(tǒng))作為其底層存儲介質(zhì),具有成本低、適合處理大文件等優(yōu)點(diǎn)。但是HDFS訪問控制能力弱,雖然目前Hadoop能夠支持Kerberos用戶認(rèn)證,但其開銷大并且靈活性差。此外,HDFS對大文件具有良好的支持度,但其對小文件的支持度低,大量小文件存儲在HDFS中,其元數(shù)據(jù)會(huì)占用主節(jié)點(diǎn)大量的空間,整個(gè)文件系統(tǒng)的文件數(shù)量將會(huì)受到限制,在HDFS中小文件讀取效率并不高,讀取大量小文件會(huì)影響主節(jié)點(diǎn)的IO性能。為提高安全性,Hadoop引入加密空間,但是存在加密算法單一、不支持迭代目錄加密、不提供應(yīng)用級加密等缺點(diǎn)。因此,本文主要從訪問控制、小文件合并、文件加密三方面進(jìn)行研究,提出三點(diǎn)優(yōu)化:(1)提出以信任值為指標(biāo)的訪問控制方法,根據(jù)用戶的訪問歷史情況以反饋式的訪問控制方式提高HDFS的訪問控制能力。(2)根據(jù)用戶的訪問歷史記錄,進(jìn)行關(guān)聯(lián)規(guī)則挖掘,在頻繁項(xiàng)集的基礎(chǔ)上,將小文件合并后的文件存放到HDFS中,采用兩級緩存策略提高讀取效率。(3)提出可拔插的方式對文件進(jìn)行加密,數(shù)據(jù)以密文的方式儲存在HDFS中,提高數(shù)據(jù)的安全性。以面向Map Reduce和面向客戶端兩種方式實(shí)現(xiàn)加密解密策略,自定義Input Format,使其支持Map Reduce。在實(shí)驗(yàn)集群中實(shí)現(xiàn)訪問控制、小文件合并以及文件加密,用醫(yī)學(xué)影像圖片進(jìn)行測試,實(shí)驗(yàn)結(jié)果表明:本文提出的基于信任值的訪問控制具有良好的性能,相比于原HDFS系統(tǒng),額外時(shí)間開銷小。小文件合并策略是十分必要的,極大的降低了元數(shù)據(jù)的空間占用,并且在集中式訪問模式下,具有良好的緩存命中率,提高了讀取效率。在非加密、XOR-AES、AES三種情況下分別在面向客戶端和面向Map Reduce兩種情況下進(jìn)行測試,XOR-AES雖然有一定的時(shí)間開銷,相比于AES算法,性能更優(yōu),在面向Map Reduce的情況下時(shí)間開銷小。經(jīng)過測試,本文提出的策略能達(dá)到預(yù)期效果。
[Abstract]:Hadoop is designed to store and analyze big data at the beginning of its design. The processing of big data has a great advantage. HDFS (Hadoop distributed File system) as its underlying storage medium has the advantages of low cost and suitable for dealing with large files. But the ability of HDFS access control is weak, although Hadoop can support Kerberos user authentication at present, its cost is large and flexibility is poor. In addition, HDFS has good support for large files, but its support for small files is low, a large number of small files are stored in HDFS, its metadata will occupy a lot of space on the master node, and the number of files in the whole file system will be limited. The efficiency of reading small and medium files in HDFS is not high, reading a large number of small files will affect the IO performance of the primary node. In order to improve the security of Hadoop, the encryption space is introduced, but the encryption algorithm is single, the iterative directory encryption is not supported, and the application level encryption is not provided. Therefore, this paper mainly from access control, small file merging, file encryption three aspects of research, proposed three optimizations: (1) proposed a trust value as the index of access control method, According to the user's access history, the access control ability of HDFS is improved by the feedback access control method. (2) according to the user's access history, association rules are mined, and based on frequent itemsets, the access control ability of HDFS is improved. The small files are stored in HDFS, and the two-level cache strategy is adopted to improve the reading efficiency. (3) the data can be encrypted by unpluggable way, and the data is stored in HDFS in ciphertext to improve the security of the data. The encryption and decryption policies are implemented by Map Reduce oriented and client-oriented, and Input format is customized to support Map reduction. Access control, small file merging and file encryption are implemented in the experimental cluster. The experimental results show that the proposed access control based on trust value has good performance compared with the original HDFS system. Extra time costs little. The small file merging strategy is very necessary, greatly reduces the space occupation of metadata, and has a good cache hit ratio in centralized access mode, and improves the reading efficiency. In the case of non-encrypted XOR-AESS-AES, the test of XOR-AES in the case of client oriented and Map Reduce oriented has some time overhead, which is better than that of AES algorithm, and the time cost is small in the case of Map Reduce oriented. After testing, the strategy proposed in this paper can achieve the desired results.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP333;TP309

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 王全民;張程;趙小桐;雷佳偉;;一種Hadoop小文件存儲優(yōu)化方案[J];計(jì)算機(jī)技術(shù)與發(fā)展;2016年11期

2 陳敏;劉寧;肖樹發(fā);肖興政;鄒玲;;醫(yī)療健康大數(shù)據(jù)應(yīng)用關(guān)鍵問題及對策研究[J];中國數(shù)字醫(yī)學(xué);2016年08期

3 王凱;賈思懿;張強(qiáng);張科;董歡慶;劉振軍;;基于Intel ISA-L的RS-RAID系統(tǒng)的研究與實(shí)現(xiàn)[J];計(jì)算機(jī)工程與應(yīng)用;2016年15期

4 吳輝群;翁霞;王磊;倪曉薇;鄒如意;陳亞蘭;施李麗;蔣葵;董建成;;醫(yī)學(xué)影像大數(shù)據(jù)的存儲與挖掘技術(shù)研究[J];中國數(shù)字醫(yī)學(xué);2016年02期

5 王紹人;杜學(xué)繪;楊智;;面向HDFS的可證明安全的單點(diǎn)登錄協(xié)議[J];計(jì)算機(jī)應(yīng)用研究;2016年07期

6 史文浩;江國華;秦小麟;王勝;;基于用戶信任值的HDFS訪問控制模型研究[J];計(jì)算機(jī)科學(xué)與探索;2016年01期

7 李三淼;李龍澍;;Hadoop中處理小文件的四種方法的性能分析[J];計(jì)算機(jī)工程與應(yīng)用;2016年09期

8 李鐵;燕彩蓉;黃永鋒;宋亞龍;;面向Hadoop分布式文件系統(tǒng)的小文件存取優(yōu)化方法[J];計(jì)算機(jī)應(yīng)用;2014年11期

9 楊彬;;分布式文件系統(tǒng)HDFS處理小文件的優(yōu)化方案[J];軟件;2014年06期

10 宋國峰;梁昌勇;;一種基于用戶行為信任的云安全訪問控制模型[J];中國管理科學(xué);2013年S2期

相關(guān)博士學(xué)位論文 前1條

1 李彭軍;醫(yī)學(xué)影像云服務(wù)平臺基礎(chǔ)架構(gòu)研究與實(shí)踐[D];南方醫(yī)科大學(xué);2011年

,

本文編號:2151535

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2151535.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶0e17a***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請E-mail郵箱bigeng88@qq.com