信息存儲系統(tǒng)中重復(fù)數(shù)據(jù)刪除技術(shù)的研究
[Abstract]:Repeated data deletion is a kind of data lossless compression solution in network storage system. It can effectively restrain the fast growth of data storage overhead and reduce the cost of building storage system and operation management. Under the background of rapid growth of data information, repeated data deletion technology has been widely concerned by academia and industry. However, there are still many technical problems in the field of repeated data deletion, such as increasing data compression ratio, reducing processing time, optimizing data reliability and so on. In view of the above problems, this paper has carried out in-depth research from three aspects: repetitive data delete processing, data reliability in repetitive data deletion processing and data distribution strategy in storage background. Based on the theoretical analysis model and the real data set, the factors that affect the processing effect of repeated data deletion are studied. The repetition feature of target data has great influence on the effect of repeated data deletion. Therefore, a repetitive data deletion strategy based on repetition feature is proposed to optimize the data compression ratio and processing time cost. The strategy mainly includes semantic data grouping strategy and progressive data segmentation granularity decision method. According to the semantic information, the data grouping strategy based on semantics is used to distinguish the repeated features and similarity of the data and to complete the grouping operation of the target data. Progressive data segmentation granularity determination method is based on the data grouping as the unit of operation, according to the repeated characteristics of the data segmentation strategy is properly set. The experimental results show that the repetitive data deletion strategy based on repetition features has better comprehensive performance in data compression ratio and processing time than other repetitive data deletion solutions. In order to solve the problem of data reliability in repeated data deletion, an optimal redundancy calculation model is proposed to improve the reliability of target data according to the heat of reference. In order to apply the theoretical model to the real storage system, this paper optimizes the feasibility of the theoretical model by taking the sample space of the data unit to calculate the empirical value, and proposes a data redundancy strategy based on the heat of reference. The optimal redundancy is configured according to the relative attributes of the data unit (the size of the data unit and the heat of reference) to ensure the optimal data reliability of the target data set using the minimum storage cost. Simulation results demonstrate the feasibility and effectiveness of the data redundancy strategy based on citation heat. Aiming at the lack of flexibility in the current data distribution strategy, a capacity-aware data distribution strategy is proposed to improve the balance of storage load in the case of unequal storage resources between physical nodes. This strategy provides a data distribution policy solution in two cases. Without considering data redundancy, a capacity-aware distributed data strategy is proposed, which is based on the consistent hash data distribution algorithm and introduces the design idea of virtualization. Virtual node allocation method is used to allocate storage resources. The load balancing method based on node capacity awareness is used to optimize the data load distribution between physical storage nodes. Considering the data redundancy, a data distribution strategy supporting multiple redundancy is proposed, which provides flexible platform support for the data redundancy policy, and optimizes the storage load balancing degree. The simulation results show that the two data distribution strategies are helpful to improve the balance level of storage data load in their respective application background.
【學(xué)位授予單位】:華中科技大學(xué)
【學(xué)位級別】:博士
【學(xué)位授予年份】:2012
【分類號】:TP333
【共引文獻(xiàn)】
相關(guān)期刊論文 前10條
1 林琳;;基于C語言的存儲資源管理系統(tǒng)的研究[J];才智;2011年13期
2 胡峰;張杰;劉靜;肖大偉;;一種基于Rough集的海量數(shù)據(jù)屬性約簡方法[J];重慶郵電大學(xué)學(xué)報(自然科學(xué)版);2009年04期
3 劉霖;趙躍龍;李成藝;;一種新的存儲解決方案——IND系統(tǒng)存儲[J];電腦與信息技術(shù);2006年05期
4 王丹玲;;虛擬化存儲及其實現(xiàn)[J];電腦知識與技術(shù);2006年05期
5 劉紹凱;;存儲區(qū)域網(wǎng)(SAN)系統(tǒng)的管理及其實現(xiàn)研究[J];電腦知識與技術(shù);2006年26期
6 蔣春曦;謝慶勝;王偉;;省級行業(yè)信息服務(wù)系統(tǒng)的設(shè)計與實現(xiàn)[J];電腦知識與技術(shù);2008年17期
7 王宇;;網(wǎng)絡(luò)存儲面面觀[J];電聲技術(shù);2008年05期
8 夏國遠(yuǎn);;數(shù)據(jù)存儲技術(shù)的應(yīng)用分析[J];大眾科技;2011年09期
9 黃曉武;;基于ISCSI的校園網(wǎng)絡(luò)存儲安全研究[J];福建電腦;2006年03期
10 王春建;;電視非編網(wǎng)素材的實時備份[J];廣播電視信息;2011年11期
相關(guān)會議論文 前2條
1 劉景寧;王曉靜;童薇;時洋;馮丹;;對象存儲器中光纖通道驅(qū)動程序設(shè)計與優(yōu)化[A];第15屆全國信息存儲技術(shù)學(xué)術(shù)會議論文集[C];2008年
2 王雪嬌;錢軍;溫東新;張展;崔忠強;;基于Linux虛擬文件系統(tǒng)故障注入器的設(shè)計與實現(xiàn)[A];第六屆中國測試學(xué)術(shù)會議論文集[C];2010年
相關(guān)博士學(xué)位論文 前10條
1 楊天明;網(wǎng)絡(luò)備份中重復(fù)數(shù)據(jù)刪除技術(shù)研究[D];華中科技大學(xué);2010年
2 牛中盈;并行文件系統(tǒng)安全性研究[D];華中科技大學(xué);2010年
3 林勝;存儲系統(tǒng)容錯及陣列編碼[D];南開大學(xué);2010年
4 陳俊健;面向?qū)ο蟠鎯ο到y(tǒng)安全技術(shù)研究[D];華中科技大學(xué);2011年
5 彭濤;基于特征和實例的海量數(shù)據(jù)約簡方法研究[D];華中科技大學(xué);2011年
6 姜明華;基于冗余智能存儲通道的存儲系統(tǒng)關(guān)鍵技術(shù)研究[D];華中科技大學(xué);2011年
7 魏青松;大規(guī)模分布式存儲技術(shù)研究[D];電子科技大學(xué);2004年
8 吳濤;虛擬化存儲技術(shù)研究[D];華中科技大學(xué);2004年
9 王爍;數(shù)字視頻播放系統(tǒng)的研究[D];華中科技大學(xué);2004年
10 鄧玉輝;基于網(wǎng)絡(luò)磁盤陣列的海量信息存儲系統(tǒng)[D];華中科技大學(xué);2004年
相關(guān)碩士學(xué)位論文 前10條
1 段莉娟;網(wǎng)絡(luò)中間件數(shù)據(jù)采集系統(tǒng)的研究與實現(xiàn)[D];電子科技大學(xué);2010年
2 胡永奎;對象存儲設(shè)備中文件系統(tǒng)的設(shè)計與實現(xiàn)[D];解放軍信息工程大學(xué);2010年
3 王莉莉;基于DELTA壓縮算法的大型數(shù)據(jù)庫災(zāi)備關(guān)鍵技術(shù)研究[D];電子科技大學(xué);2011年
4 柏宏斌;基于B/S架構(gòu)的信息管理系統(tǒng)理論研究[D];電子科技大學(xué);2010年
5 彭亮亮;基于IPv6的校園網(wǎng)絡(luò)存儲再生性的研究[D];西安電子科技大學(xué);2010年
6 魯昌龍;固態(tài)硬盤存儲系統(tǒng)模型及存儲管理層算法的研究[D];景德鎮(zhèn)陶瓷學(xué)院;2011年
7 徐忠明;基于Hadoop的網(wǎng)絡(luò)驗證平臺的研究[D];廣東工業(yè)大學(xué);2011年
8 蔡洪;智能網(wǎng)絡(luò)存儲系統(tǒng)(INSS)中負(fù)載均衡技術(shù)的研究[D];華南理工大學(xué);2011年
9 王承才;小學(xué)校園Web網(wǎng)絡(luò)硬盤應(yīng)用系統(tǒng)的研究及實現(xiàn)[D];華南理工大學(xué);2011年
10 羅浩;基于P2P的分布式存儲研究與實現(xiàn)[D];電子科技大學(xué);2011年
本文編號:2316231
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2316231.html