重復(fù)數(shù)據(jù)刪除系統(tǒng)的性能優(yōu)化研究
發(fā)布時間:2018-03-31 16:07
本文選題:重復(fù)數(shù)據(jù)刪除 切入點(diǎn):索引機(jī)制 出處:《華中科技大學(xué)》2013年碩士論文
【摘要】:隨著互聯(lián)網(wǎng)、移動互聯(lián)網(wǎng)和信息技術(shù)的更新和發(fā)展,企業(yè)越來越意識到信息的載體-數(shù)據(jù)對于企業(yè)發(fā)展所起到的決定性作用。進(jìn)入大數(shù)據(jù)時代,數(shù)據(jù)的爆炸式增長使得重復(fù)數(shù)據(jù)刪除技術(shù)受到包括學(xué)術(shù)界和商業(yè)領(lǐng)域越來越多的關(guān)注。去重率是具有重復(fù)數(shù)據(jù)刪除功能的存儲系統(tǒng)必須考慮的一個重要因素。因?yàn)槭腔谖募南嗨菩裕饕绞紼xtreme Binning有可能會因?yàn)槲募g缺乏相似性而導(dǎo)致不能識別和消除大量重復(fù)數(shù)據(jù)。數(shù)據(jù)碎片是重復(fù)數(shù)據(jù)刪除系統(tǒng)中另一個急需解決的問題,,它會影響系統(tǒng)的讀性能,導(dǎo)致重復(fù)數(shù)據(jù)刪除系統(tǒng)的恢復(fù)性能不好。 為了進(jìn)一步提升去重率,設(shè)計并實(shí)現(xiàn)了一種新的索引方式-Segment Index,不同于Extreme Binning,Segment Index基于段的相似性,而不是傳統(tǒng)的文件相似性,因此能夠更好地挖掘數(shù)據(jù)塊之間的相似性,從而在消耗更少系統(tǒng)負(fù)載的前提下提供更高的去重率。為了解決重復(fù)數(shù)據(jù)刪除系統(tǒng)帶來的數(shù)據(jù)塊碎片問題,設(shè)計并實(shí)現(xiàn)了一種重寫策略-CFL(Chunk Fragmentation Level),通過計算系統(tǒng)當(dāng)前的碎片化程度,決定是否對某些重復(fù)數(shù)據(jù)塊的重寫來提高系統(tǒng)的讀性能。 綜合測試表明:采用Segment Index能夠刪除93.02%到99.91%的重復(fù)數(shù)據(jù),而同樣條件下,Extreme Binning能夠刪除85.15%到97.46%的重復(fù)數(shù)據(jù)。系統(tǒng)采用CFL策略后,讀性能比不用任何重寫策略提高了大約58.7%。
[Abstract]:With the renewal and development of Internet, mobile Internet and information technology, enterprises are becoming more and more aware of the decisive role that information carrier-data plays in the development of enterprises. With the explosive growth of data, repeated data deletion technology has attracted more and more attention in both academic and commercial fields. The removal rate is an important factor that must be taken into account in storage systems with repetitive data deletion. To be based on file similarity, Extreme Binning may not recognize and eliminate a large amount of duplicate data because of the lack of similarity between files. Data fragmentation is another urgent problem in duplicate data deletion system, which will affect the read performance of the system. The recovery performance of the duplicate data deletion system is not good. In order to further improve the removal rate, a new indexing method, namely, -Segment Index, is designed and implemented, which is different from the similarity of segment based on Extreme binding Segment Index, rather than the traditional similarity of files, so it can better mine the similarity between blocks of data. In order to solve the problem of data block fragmentation caused by repeated data deletion system, a rewriting strategy-CFL / chunk Fragmentation level is designed and implemented to calculate the current fragmentation degree of the system. Determines whether to rewrite certain duplicate data blocks to improve read performance of the system. The comprehensive test shows that using Segment Index can delete 93.02% to 99.91% of repeated data, while extreme Binning can delete 85.15% to 97.46% of repeated data under the same conditions. After using CFL strategy, the reading performance of the system can be improved by 58.7% than that without any rewriting strategy.
【學(xué)位授予單位】:華中科技大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:TP333
【參考文獻(xiàn)】
相關(guān)期刊論文 前5條
1 陳汶濱;呂曼曼;劉義軍;;容災(zāi)備份系統(tǒng)研究[J];計算機(jī)安全;2009年07期
2 程嵐;;淺談企業(yè)數(shù)據(jù)容災(zāi)[J];華南金融電腦;2009年01期
3 敖莉;舒繼武;李明強(qiáng);;重復(fù)數(shù)據(jù)刪除技術(shù)[J];軟件學(xué)報;2010年05期
4 龔略;;我國本土企業(yè)亮劍海量存儲領(lǐng)域[J];數(shù)字通信世界;2012年07期
5 王樹鵬;;重復(fù)數(shù)據(jù)刪除技術(shù)的發(fā)展及應(yīng)用[J];中興通訊技術(shù);2010年05期
相關(guān)博士學(xué)位論文 前2條
1 楊天明;網(wǎng)絡(luò)備份中重復(fù)數(shù)據(jù)刪除技術(shù)研究[D];華中科技大學(xué);2010年
2 譚玉娟;數(shù)據(jù)備份系統(tǒng)中數(shù)據(jù)去重技術(shù)研究[D];華中科技大學(xué);2012年
本文編號:1691457
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1691457.html
最近更新
教材專著