天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 計算機論文 >

虛擬機鏡像文件去重技術(shù)研究

發(fā)布時間:2018-11-05 14:08
【摘要】:虛擬機技術(shù)和虛擬計算環(huán)境是計算機科學(xué)近年來最矚目的成就之一,虛擬機鏡像文件作為其存儲與傳輸?shù)妮d體,將內(nèi)容用某一種特定的文件格式進行存儲,為云計算帶來了極高的便捷性。但隨著用戶創(chuàng)建"一次性"虛擬機數(shù)量的增加,云平臺中虛擬機鏡像文件數(shù)量也隨之驟增,產(chǎn)生的冗余數(shù)據(jù)為云計算供應(yīng)商帶來了巨大挑戰(zhàn),因此對虛擬機鏡像文件進行重復(fù)數(shù)據(jù)刪除十分必要。既有文獻在對相同虛擬機鏡像文件的去重研究中,在去重粒度的選擇與劃分上存在一定的不足,即利用Hash在文件級層面進行數(shù)據(jù)去重,而忽略了虛擬機鏡像文件之間的相似性,因此對于相似的虛擬機鏡像文件的研究還屬空白。為了解決這一問題本文提出、設(shè)計并實現(xiàn)基于SimHash的不同粒度分級去重方案,解決了相似虛擬機鏡像文件去重問題,達到了提高存儲空間利用率,節(jié)省網(wǎng)絡(luò)帶寬的目的。論文的主要內(nèi)容如下:(1)對虛擬機鏡像文件以及鏡像文件格式和虛擬機鏡像文件的相似性做了詳細(xì)的分析,分析結(jié)果表明虛擬機鏡像文件的格式與數(shù)據(jù)冗余有密切的聯(lián)系,同一格式的鏡像文件之間存在超過60%的相似數(shù)據(jù),證明研究相同鏡像文件以及相似鏡像文件重復(fù)數(shù)據(jù)刪除的必要性。(2)設(shè)計并實現(xiàn)了一種基于SimHash算法的虛擬機鏡像文件分級重復(fù)數(shù)據(jù)刪除方案。該方案利用固定尺寸的分塊技術(shù)將鏡像文件分割成若干數(shù)據(jù)塊,使用改進后的SimHash函數(shù)計算其SimHash值并作為唯一標(biāo)識,預(yù)傳SimHashID來減少網(wǎng)絡(luò)傳輸開銷,對文件進行相似性對比實現(xiàn)分級去重,第一級以文件為對象,第二級以數(shù)據(jù)塊為對象。在指紋搜索引入過濾器減少磁盤索引次數(shù)。(3)對實現(xiàn)的方案進行試驗測試。對重復(fù)數(shù)據(jù)刪除率、重復(fù)數(shù)據(jù)刪除準(zhǔn)確率,可行性及穩(wěn)定性進行了試驗,并與原有的數(shù)據(jù)去重方案進行的對比。實驗結(jié)果表明了此方案的可行性,并在去重率以及去重準(zhǔn)確率上存在一定的優(yōu)勢,可以節(jié)省將近60%的存儲空間,但在穩(wěn)定性上存在一定的不足,需要進一步研究并解決。
[Abstract]:Virtual machine technology and virtual computing environment are one of the most remarkable achievements in computer science in recent years. Virtual machine image file is used as the carrier of storage and transmission, and the content is stored in a specific file format. Cloud computing brings great convenience. But as the number of "one-off" virtual machines created by users increases, so does the number of virtual machine mirroring files on cloud platforms, creating redundant data that poses a huge challenge to cloud computing vendors. So it is necessary to delete duplicate data from virtual machine image file. In the research of the same virtual machine image file, the existing literature has some shortcomings in the selection and partition of the de-granularity, that is, using Hash to remove the data at the file level, but neglecting the similarity between the virtual machine mirror files. Therefore, the study of similar virtual machine image files is still blank. In order to solve this problem, this paper proposes to design and implement different granularity gradation de-reduplication scheme based on SimHash, solve the problem of image de-reduplication of similar virtual machine, and achieve the purpose of improving storage space utilization and saving network bandwidth. The main contents of this paper are as follows: (1) the similarity of virtual machine image file and image file format and virtual machine image file are analyzed in detail. The results show that the format of virtual machine image file is closely related to data redundancy. There is more than 60% similar data between mirrored files in the same format, It is necessary to study the same image file and similar image file duplicate data deletion. (2) A scheme of virtual machine image file hierarchical duplicate data deletion based on SimHash algorithm is designed and implemented. In this scheme, the image file is divided into several data blocks by using the fixed size block technology. The improved SimHash function is used to calculate its SimHash value and is used as a unique symbol to reduce the network transmission overhead by pre-transmitting SimHashID. The similarity comparison of the files is carried out in a hierarchical way. The first level takes the file as the object and the second level takes the data block as the object. A filter is introduced into fingerprint search to reduce the number of disk indexes. (3) A test of the scheme is carried out. The rate of repeated data deletion, the accuracy, feasibility and stability of repeated data deletion are tested and compared with the original data removal scheme. The experimental results show that the scheme is feasible and has some advantages in the removal rate and accuracy rate, which can save nearly 60% storage space, but there are some shortcomings in the stability, which need to be further studied and solved.
【學(xué)位授予單位】:北京交通大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP333;TP302

【參考文獻】

相關(guān)期刊論文 前10條

1 王世光;張德朝;王磊;李晗;;家庭場景網(wǎng)關(guān)功能虛擬化技術(shù)探討[J];電信網(wǎng)技術(shù);2016年09期

2 王柳;;CommVault推出Simpana套件[J];軟件和信息服務(wù);2014年09期

3 潘智;張海峰;程巍;蘇杰;;服務(wù)器虛擬化技術(shù)的應(yīng)用實踐[J];柳鋼科技;2014年02期

4 謝垂益;鐘紅君;;Rabin指紋算法在重復(fù)數(shù)據(jù)檢測中的應(yīng)用研究[J];電腦知識與技術(shù);2013年21期

5 許艷軍;姜進磊;王博;楊廣文;;幾種虛擬機鏡像格式及其性能測評[J];計算機應(yīng)用;2013年S1期

6 付印金;肖儂;劉芳;;重復(fù)數(shù)據(jù)刪除關(guān)鍵技術(shù)研究進展[J];計算機研究與發(fā)展;2012年01期

7 敖莉;舒繼武;李明強;;重復(fù)數(shù)據(jù)刪除技術(shù)[J];軟件學(xué)報;2010年05期

8 finekl;;RAW格式優(yōu)勢全知道[J];電子世界;2008年05期

9 劉波;;虛擬機技術(shù)在計算機教學(xué)中的應(yīng)用[J];電腦知識與技術(shù)(學(xué)術(shù)交流);2007年19期

10 懷進鵬;李沁;胡春明;;基于虛擬機的虛擬計算環(huán)境研究與設(shè)計[J];軟件學(xué)報;2007年08期

,

本文編號:2312313

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2312313.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶2808d***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com