基于云存儲的重復數(shù)據(jù)刪除文件系統(tǒng)設(shè)計與實現(xiàn)
發(fā)布時間:2018-03-29 00:03
本文選題:重復數(shù)據(jù)刪除 切入點:云存儲 出處:《華中科技大學》2013年碩士論文
【摘要】:隨著在線存儲需求量的增長,各大云存儲公司開始計費模式的探索,只有付費才能獲得更好的服務(wù),免費的云存儲空間已經(jīng)不能滿足用戶的需求,云存儲的成本問題已經(jīng)開始影響用戶的工作生活。針對上述問題,,提出了一種基于云存儲的重復數(shù)據(jù)刪除文件系統(tǒng)。 該系統(tǒng)是一個具有云存儲增量同步的用戶端文件系統(tǒng),采用重復數(shù)據(jù)刪除技術(shù),自動將用戶的本地數(shù)據(jù)無冗余上傳到云端。系統(tǒng)由六個模塊構(gòu)成,用戶接口模塊接收從Fuse內(nèi)核空間傳遞過來的系統(tǒng)請求,調(diào)用相關(guān)模塊完成響應(yīng)。云端同步模塊利用云存儲開放接口,配合系統(tǒng)各模塊進行本地與云端數(shù)據(jù)同步。文件管理模塊從云端獲取文件列表,建立文件索引節(jié)點,對文件進行組織管理。文件操作模塊處理系統(tǒng)讀寫請求。數(shù)據(jù)重刪模塊在源端進行重復數(shù)據(jù)刪除,該模塊采用基于內(nèi)容的變長切分算法,使用一個長度固定的滑動窗口對文件數(shù)據(jù)計算指紋,如果指紋模一個特定的整數(shù)等于預定的數(shù)值,就把窗口位置作為塊的邊界,若出現(xiàn)指紋相同的塊則認為重復。將去重后的文件和記錄數(shù)據(jù)塊信息的元數(shù)據(jù)表上傳到云端。垃圾回收模塊在系統(tǒng)卸載時,回收不用的表和冗余的數(shù)據(jù)文件。 利用多版本內(nèi)核文件和虛擬機文件,對系統(tǒng)進行重復數(shù)據(jù)刪除壓縮比測試。結(jié)果表明,在大規(guī)模文檔數(shù)據(jù)中,去重率最高達到67%。以阿里云平臺計費標準核算,1TB用戶數(shù)據(jù)理論上能夠節(jié)省4391元/年。
[Abstract]:With the increasing demand for online storage, the major cloud storage companies began to explore the charging model. Only by paying can we get better services. Free cloud storage space can no longer meet the needs of users. The cost of cloud storage has already begun to affect the working life of users. In view of the above problems, a file system for deleting duplicate data based on cloud storage is proposed. The system is a file system with incremental synchronization of cloud storage. It automatically uploads the local data of the user to the cloud without redundancy by using repeated data deletion technology. The system consists of six modules. The user interface module receives the system request passed from the Fuse kernel space and calls the relevant module to complete the response. The file management module acquires the file list from the cloud and establishes the file index node. File management, file operation module processing system read and write request, data redelete module in the source end of repeated data deletion, the module uses content-based variable length segmentation algorithm, A fixed length sliding window is used to calculate the fingerprint of the file data. If a particular integer is equal to a predetermined value, the window position is used as the boundary of the block. If a block with the same fingerprint appears, the duplicate file and the metadata table recording the block information are uploaded to the cloud. The garbage collection module retrieves unused tables and redundant data files when the system unloads. By using multi-version kernel files and virtual machine files, the system was tested for repeated data deletion compression ratio. The results show that, in large scale document data, The highest weight removal rate is 67 yuan. According to the standard accounting standard of Ali cloud platform, one terabyte user data can be saved 4391 yuan per year theoretically.
【學位授予單位】:華中科技大學
【學位級別】:碩士
【學位授予年份】:2013
【分類號】:TP333
【參考文獻】
相關(guān)期刊論文 前2條
1 高英,郭荷清;基于改進的ADO.NET的通用數(shù)據(jù)庫引擎的設(shè)計與實現(xiàn)[J];計算機應(yīng)用;2005年01期
2 萬繼光,詹玲;一種集群NAS網(wǎng)絡(luò)備份系統(tǒng)的研究與實現(xiàn)[J];小型微型計算機系統(tǒng);2005年06期
本文編號:1678640
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1678640.html
最近更新
教材專著