虛擬機鏡像文件去重技術(shù)研究
[Abstract]:Virtual machine technology and virtual computing environment are one of the most remarkable achievements in computer science in recent years. Virtual machine image file is used as the carrier of storage and transmission, and the content is stored in a specific file format. Cloud computing brings great convenience. But as the number of "one-off" virtual machines created by users increases, so does the number of virtual machine mirroring files on cloud platforms, creating redundant data that poses a huge challenge to cloud computing vendors. So it is necessary to delete duplicate data from virtual machine image file. In the research of the same virtual machine image file, the existing literature has some shortcomings in the selection and partition of the de-granularity, that is, using Hash to remove the data at the file level, but neglecting the similarity between the virtual machine mirror files. Therefore, the study of similar virtual machine image files is still blank. In order to solve this problem, this paper proposes to design and implement different granularity gradation de-reduplication scheme based on SimHash, solve the problem of image de-reduplication of similar virtual machine, and achieve the purpose of improving storage space utilization and saving network bandwidth. The main contents of this paper are as follows: (1) the similarity of virtual machine image file and image file format and virtual machine image file are analyzed in detail. The results show that the format of virtual machine image file is closely related to data redundancy. There is more than 60% similar data between mirrored files in the same format, It is necessary to study the same image file and similar image file duplicate data deletion. (2) A scheme of virtual machine image file hierarchical duplicate data deletion based on SimHash algorithm is designed and implemented. In this scheme, the image file is divided into several data blocks by using the fixed size block technology. The improved SimHash function is used to calculate its SimHash value and is used as a unique symbol to reduce the network transmission overhead by pre-transmitting SimHashID. The similarity comparison of the files is carried out in a hierarchical way. The first level takes the file as the object and the second level takes the data block as the object. A filter is introduced into fingerprint search to reduce the number of disk indexes. (3) A test of the scheme is carried out. The rate of repeated data deletion, the accuracy, feasibility and stability of repeated data deletion are tested and compared with the original data removal scheme. The experimental results show that the scheme is feasible and has some advantages in the removal rate and accuracy rate, which can save nearly 60% storage space, but there are some shortcomings in the stability, which need to be further studied and solved.
【學(xué)位授予單位】:北京交通大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP333;TP302
【參考文獻】
相關(guān)期刊論文 前10條
1 王世光;張德朝;王磊;李晗;;家庭場景網(wǎng)關(guān)功能虛擬化技術(shù)探討[J];電信網(wǎng)技術(shù);2016年09期
2 王柳;;CommVault推出Simpana套件[J];軟件和信息服務(wù);2014年09期
3 潘智;張海峰;程巍;蘇杰;;服務(wù)器虛擬化技術(shù)的應(yīng)用實踐[J];柳鋼科技;2014年02期
4 謝垂益;鐘紅君;;Rabin指紋算法在重復(fù)數(shù)據(jù)檢測中的應(yīng)用研究[J];電腦知識與技術(shù);2013年21期
5 許艷軍;姜進磊;王博;楊廣文;;幾種虛擬機鏡像格式及其性能測評[J];計算機應(yīng)用;2013年S1期
6 付印金;肖儂;劉芳;;重復(fù)數(shù)據(jù)刪除關(guān)鍵技術(shù)研究進展[J];計算機研究與發(fā)展;2012年01期
7 敖莉;舒繼武;李明強;;重復(fù)數(shù)據(jù)刪除技術(shù)[J];軟件學(xué)報;2010年05期
8 finekl;;RAW格式優(yōu)勢全知道[J];電子世界;2008年05期
9 劉波;;虛擬機技術(shù)在計算機教學(xué)中的應(yīng)用[J];電腦知識與技術(shù)(學(xué)術(shù)交流);2007年19期
10 懷進鵬;李沁;胡春明;;基于虛擬機的虛擬計算環(huán)境研究與設(shè)計[J];軟件學(xué)報;2007年08期
,本文編號:2312313
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2312313.html