天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 計(jì)算機(jī)論文 >

HDFS下文件存儲研究與優(yōu)化

發(fā)布時間:2018-06-26 12:41

  本文選題:云存儲 + Hadoop ; 參考:《廣東工業(yè)大學(xué)》2013年碩士論文


【摘要】:近年來云計(jì)算得到廣泛的研究與應(yīng)用,并迅速成為計(jì)算機(jī)領(lǐng)域最為熱門的話題。云存儲是在云計(jì)算概念基礎(chǔ)上延伸和發(fā)展出來的一個新概念,其中又以Hadoop框架的HDFS存儲系統(tǒng)最為著名。研究發(fā)現(xiàn),網(wǎng)絡(luò)中存在大量的重復(fù)數(shù)據(jù),數(shù)據(jù)的重復(fù)存儲會對空間造成極大浪費(fèi);而且小文件數(shù)量眾多,加之讀寫請求頻繁,所有的請求都由HDFS系統(tǒng)中唯一的NameNode進(jìn)行處理,會導(dǎo)致整個系統(tǒng)性能急劇下降。 論文首先對Hadoop系統(tǒng)架構(gòu)及實(shí)現(xiàn)技術(shù)進(jìn)行了全面分析,并介紹了重復(fù)數(shù)據(jù)刪除相關(guān)技術(shù),同時分析了HDFS在處理大量小文件時存在的不足,為論文的下一步研究提供理論依據(jù)。 本文在傳統(tǒng)HDFS體系架構(gòu)的基礎(chǔ)上,提出了一種新的HDFS體系架構(gòu),并對元數(shù)據(jù)管理和文件操作流程進(jìn)行了設(shè)計(jì)。針對網(wǎng)絡(luò)中存在大量重數(shù)據(jù)及小文件的問題,分別設(shè)計(jì)了相應(yīng)的處理策略。本文的主要研究內(nèi)容和創(chuàng)新點(diǎn)如下: (1)基于傳統(tǒng)的HDFS提出了一種新的HDFS體系架構(gòu),即在每個機(jī)架新增一臺NameNode負(fù)責(zé)本機(jī)架事務(wù)的處理。分析了主NameNode和機(jī)架內(nèi)NameNode元數(shù)據(jù)緩存及恢復(fù)機(jī)制,并對文件操作的元數(shù)據(jù)獲取過程進(jìn)行了重新設(shè)計(jì)。 (2)針對重復(fù)數(shù)據(jù)的問題,本文采用雙重認(rèn)證的方式。首先設(shè)計(jì)了關(guān)鍵詞提取策略,對提取結(jié)果進(jìn)行哈希計(jì)算,在此基礎(chǔ)上結(jié)合文本相似匹配技術(shù)完成重復(fù)數(shù)據(jù)的判定。此策略避免了固定長度分塊重復(fù)數(shù)據(jù)刪除技術(shù)的弊端,對重復(fù)數(shù)據(jù)的判定更加智能化,在節(jié)省存儲空間的同時加強(qiáng)了重復(fù)數(shù)據(jù)刪除的準(zhǔn)確性和科學(xué)性。 (3)針對小文件的處理,結(jié)合小文件合并方案,對元數(shù)據(jù)的結(jié)構(gòu)、緩存內(nèi)容以及更新機(jī)制進(jìn)行了分析。同時,對小文件讀、寫和刪除操作流程進(jìn)行了詳細(xì)分析設(shè)計(jì)。由于將小文件進(jìn)行合并,節(jié)省了系統(tǒng)存儲空間,且機(jī)架內(nèi)NameNode完成了本機(jī)架內(nèi)大部分請求的處理,有效緩解了主NameNode負(fù)擔(dān),從而進(jìn)一步優(yōu)化了系統(tǒng)性能。 根據(jù)設(shè)計(jì)方案,文章最后進(jìn)行了相應(yīng)的仿真實(shí)驗(yàn),從實(shí)驗(yàn)結(jié)果可以看出,本文的設(shè)計(jì)在重復(fù)數(shù)據(jù)刪除的準(zhǔn)確性和科學(xué)性、小文件I/O速度及NameNode內(nèi)存使用率與CPU使用率等方面的性能都有不同程度地提升,從而說明了設(shè)計(jì)的有效性和科學(xué)性。
[Abstract]:Cloud computing has been widely studied and applied in recent years, and has quickly become the hottest topic in computer field. Cloud storage is a new concept extended and developed on the basis of cloud computing concept, among which HDFS storage system of Hadoop framework is the most famous. The study found that there are a lot of duplicate data in the network, and the repeated storage of the data will cause a great waste of space; moreover, the large number of small files and frequent requests for reading and writing, all requests are handled by the unique name Node in the HDFS system. It can lead to a sharp decline in the performance of the entire system. Firstly, the architecture and implementation technology of Hadoop system are analyzed, and the related techniques of repeated data deletion are introduced. At the same time, the shortcomings of HDFS in dealing with a large number of small files are analyzed, which provides a theoretical basis for the next research of this paper. Based on the traditional HDFS architecture, this paper proposes a new HDFS architecture, and designs the metadata management and file operation flow. Aiming at the problem of large amount of heavy data and small files in the network, the corresponding processing strategies are designed. The main contents and innovations of this paper are as follows: (1) A new HDFS architecture based on traditional HDFS is proposed, in which a new NameNode is added to each rack to handle the native rack transaction. This paper analyzes the cache and recovery mechanism of the main NameNode and the NameNode metadata in the rack, and redesigns the metadata acquisition process of the file operation. (2) aiming at the problem of repeated data, this paper adopts the method of double authentication. First, the keyword extraction strategy is designed, and the hash calculation of the extracted results is carried out. On this basis, the duplicate data is judged by combining the text similarity matching technique. This strategy avoids the drawback of the fixed length block repeat data deletion technology, and it is more intelligent to judge the repeated data. While saving storage space, the accuracy and scientificalness of duplicate data deletion are strengthened. (3) the structure, cache content and update mechanism of metadata are analyzed according to the processing of small files, combined with the scheme of small file merging. At the same time, the operation flow of reading, writing and deleting small files is analyzed and designed in detail. Because the small files are merged, the storage space of the system is saved, and the NameNode in the rack completes the processing of most requests in the native rack, which effectively alleviates the burden of the main NameNode and further optimizes the system performance. According to the design scheme, the paper carries on the corresponding simulation experiment at the end, from the experimental result, we can see that the design of this paper is accurate and scientific in the duplicate data deletion. The performance of small file I / O speed, NameNode memory usage and CPU usage are improved to some extent, which shows that the design is effective and scientific.
【學(xué)位授予單位】:廣東工業(yè)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:TP333

【參考文獻(xiàn)】

相關(guān)期刊論文 前6條

1 王峰;雷葆華;;Hadoop分布式文件系統(tǒng)的模型分析[J];電信科學(xué);2010年12期

2 李成華;張新訪;金海;向文;;MapReduce:新型的分布式并行計(jì)算編程模型[J];計(jì)算機(jī)工程與科學(xué);2011年03期

3 程嵐嵐,何丕廉,孫越恒;基于樸素貝葉斯模型的中文關(guān)鍵詞提取算法研究[J];計(jì)算機(jī)應(yīng)用;2005年12期

4 郭慶琳;李艷梅;唐琦;;基于VSM的文本相似度計(jì)算的研究[J];計(jì)算機(jī)應(yīng)用研究;2008年11期

5 陳康;鄭緯民;;云計(jì)算:系統(tǒng)實(shí)例與研究現(xiàn)狀[J];軟件學(xué)報;2009年05期

6 張紅鷹;;中文文本關(guān)鍵詞提取算法[J];計(jì)算機(jī)系統(tǒng)應(yīng)用;2009年08期

相關(guān)碩士學(xué)位論文 前4條

1 李寬;基于HDFS的分布式Namenode節(jié)點(diǎn)模型的研究[D];華南理工大學(xué);2011年

2 李書鵬;分布式文件系統(tǒng)在云存儲環(huán)境下的若干問題研究[D];中國科學(xué)技術(shù)大學(xué);2011年

3 黃曉云;基于HDFS的云存儲服務(wù)系統(tǒng)研究[D];大連海事大學(xué);2010年

4 張密密;MapReduce模型在Hadoop實(shí)現(xiàn)中的性能分析及改進(jìn)優(yōu)化[D];電子科技大學(xué);2010年

,

本文編號:2070416

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2070416.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶fc301***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com
国产成人高清精品尤物| 久久精品国产99国产免费| 免费精品国产日韩热久久| 欧美国产日产在线观看| 在线日韩中文字幕一区 | 一区二区三区国产日韩| 狠狠干狠狠操在线播放| 亚洲精品一区二区三区日韩| 国产一区二区精品高清免费| 又黄又爽禁片视频在线观看| 欧美加勒比一区二区三区| 少妇视频一区二区三区| 国产一级不卡视频在线观看| 国产精品国产亚洲看不卡| 婷婷色国产精品视频一区| 精品精品国产自在久久高清| 欧美日本道一区二区三区| 日韩偷拍精品一区二区三区| 一区二区日韩欧美精品| 国产精品免费视频久久| 日本精品理论在线观看| 中日韩美一级特黄大片| 国内胖女人做爰视频有没有| 极品少妇一区二区三区精品视频 | 极品熟女一区二区三区| 老司机激情五月天在线不卡| 婷婷激情四射在线观看视频| 国产精品内射婷婷一级二级| 国内精品美女福利av在线| 蜜臀人妻一区二区三区| 91日韩在线视频观看| 亚洲综合香蕉在线视频| 不卡中文字幕在线视频| 办公室丝袜高跟秘书国产| 日韩女优精品一区二区三区| 麻豆国产精品一区二区三区| 亚洲欧美国产网爆精品| 亚洲欧美日韩国产自拍| 国产欧美韩日一区二区三区| 亚洲最新的黄色录像在线| 日韩国产亚洲欧美激情|