基于Hadoop的文件同步存儲系統(tǒng)的設計與實現(xiàn)
發(fā)布時間:2019-03-06 13:17
【摘要】:云計算時代,隨著網(wǎng)絡終端設備的廣泛使用以及互聯(lián)網(wǎng)技術的進一步普及,數(shù)據(jù)存儲與備份技術已經(jīng)與個人生活及組織的運作息息相關,企業(yè)與個人均面臨著海量數(shù)據(jù)的管理難題。云存儲及其相關技術的發(fā)展給數(shù)據(jù)存儲領域帶來了革新;谠拼鎯Φ脑诰存儲系統(tǒng)能夠向用戶提供永久的,存儲空間可擴展的,便捷的,價格低廉的數(shù)據(jù)存儲與備份服務。當前國內(nèi)比較成熟的存儲服務產(chǎn)品有金山快盤、華為網(wǎng)盤等。它們都提供了穩(wěn)定的數(shù)據(jù)存儲、文件同步功能,但也存在一些問題。首先,客戶端提供的文件系統(tǒng)監(jiān)控功能不夠完善;其次,文件的數(shù)據(jù)同步效率在某些情況下較低;此外,有些產(chǎn)品沒有提供數(shù)據(jù)的安全傳輸功能,也沒有提供對多種同步事件的分類數(shù)據(jù)傳輸功能;最后,現(xiàn)有產(chǎn)品尚未提供客戶端與服務器數(shù)據(jù)的加密存儲功能。支撐數(shù)據(jù)存儲的云存儲平臺的優(yōu)化也是提供基于云存儲的數(shù)據(jù)同步存儲服務廠商應該努力解決的問題。 本文從在線同步存儲服務使用者的角度出發(fā),總結(jié)了當前同步存儲服務產(chǎn)品的主要功能以及存在的一些問題,從需求與問題出發(fā),深入研究了實現(xiàn)基于云存儲的文件同步存儲系統(tǒng)的關鍵技術,設計并實現(xiàn)了一種基于hadoop搭建的云存儲后臺,使用了Rsync同步算法的文件同步存儲系統(tǒng)。論文的主要工作包括:分析國內(nèi)外同類產(chǎn)品的優(yōu)缺點,明確系統(tǒng)用戶的需求;利用開源的jpathwatch類庫實時監(jiān)控系統(tǒng)客戶端虛擬磁盤的更新變化,實現(xiàn)了不同類型同步事件的實時觸發(fā)和通知功能,,添加了對文件移動和文件重命名的監(jiān)控;通過對同步事件的分類,實現(xiàn)了不同事件的分類化處理,特別是文件內(nèi)容更新和續(xù)傳事件,設計了一種基于Rsync算法的同步協(xié)議來減少通信雙方的數(shù)據(jù)傳輸量,改進了同步效率;針對不同的同步任務,設計了最佳的數(shù)據(jù)傳輸方式,使用HTTPS實現(xiàn)數(shù)據(jù)的加密傳輸;使用了基于Hadoop的云存儲后臺存儲數(shù)據(jù)。 本文采用分層模塊化的方法對系統(tǒng)進行設計與實現(xiàn),并且在論文的最后兩個章節(jié)對系統(tǒng)的功能模塊進行了測試與分析,總結(jié)了研究成果和系統(tǒng)的可擴展功能,最后展望了下一步工作。
[Abstract]:In the age of cloud computing, with the widespread use of network terminal devices and the further popularization of Internet technology, data storage and backup technology has become closely related to individual life and the operation of organizations. Enterprises and individuals are faced with the management problems of massive data. The development of cloud storage and related technologies has brought innovation to the field of data storage. Cloud-based online storage system can provide users with permanent, scalable, convenient and inexpensive data storage and backup services. At present, the more mature domestic storage service products are Jinshan Express, Huawei Netdisk and so on. They all provide stable data storage and file synchronization, but there are also some problems. Firstly, the monitoring function of file system provided by client is not perfect, secondly, the efficiency of file data synchronization is low in some cases. In addition, some products do not provide secure transmission of data or classified data transfer for multiple synchronization events; finally, the existing products do not provide encrypted storage of client and server data. The optimization of cloud storage platform that supports data storage is also a problem that vendors should strive to solve to provide data synchronization storage services based on cloud storage. From the point of view of online synchronous storage service consumer, this paper summarizes the main functions and existing problems of current synchronous storage service products, and starts from the requirements and problems. The key technology of file synchronization storage system based on cloud storage is studied deeply. A cloud storage background based on hadoop is designed and implemented. A file synchronization storage system based on Rsync synchronization algorithm is designed and implemented. The main work of this paper includes: analyzing the advantages and disadvantages of the same kind of products at home and abroad, clarifying the needs of the system users; Using the open source jpathwatch class library to monitor the change of virtual disk in the client, the real-time trigger and notification function of different kinds of synchronous events is realized, and the monitoring of file movement and file renaming is added. Through the classification of synchronous events, the classification of different events, especially the update of file contents and the continuation of events, is realized. A synchronization protocol based on Rsync algorithm is designed to reduce the amount of data transmission between communication parties and improve the synchronization efficiency. According to different synchronization tasks, the optimal data transmission mode is designed, the encrypted data transmission is realized by using HTTPS, and the Hadoop-based cloud storage is used to store the data in the background. In this paper, the hierarchical modularization method is used to design and implement the system, and in the last two chapters of the paper, the functional modules of the system are tested and analyzed, and the research results and the extensible functions of the system are summarized. Finally, the future work is prospected.
【學位授予單位】:電子科技大學
【學位級別】:碩士
【學位授予年份】:2012
【分類號】:TP333
本文編號:2435552
[Abstract]:In the age of cloud computing, with the widespread use of network terminal devices and the further popularization of Internet technology, data storage and backup technology has become closely related to individual life and the operation of organizations. Enterprises and individuals are faced with the management problems of massive data. The development of cloud storage and related technologies has brought innovation to the field of data storage. Cloud-based online storage system can provide users with permanent, scalable, convenient and inexpensive data storage and backup services. At present, the more mature domestic storage service products are Jinshan Express, Huawei Netdisk and so on. They all provide stable data storage and file synchronization, but there are also some problems. Firstly, the monitoring function of file system provided by client is not perfect, secondly, the efficiency of file data synchronization is low in some cases. In addition, some products do not provide secure transmission of data or classified data transfer for multiple synchronization events; finally, the existing products do not provide encrypted storage of client and server data. The optimization of cloud storage platform that supports data storage is also a problem that vendors should strive to solve to provide data synchronization storage services based on cloud storage. From the point of view of online synchronous storage service consumer, this paper summarizes the main functions and existing problems of current synchronous storage service products, and starts from the requirements and problems. The key technology of file synchronization storage system based on cloud storage is studied deeply. A cloud storage background based on hadoop is designed and implemented. A file synchronization storage system based on Rsync synchronization algorithm is designed and implemented. The main work of this paper includes: analyzing the advantages and disadvantages of the same kind of products at home and abroad, clarifying the needs of the system users; Using the open source jpathwatch class library to monitor the change of virtual disk in the client, the real-time trigger and notification function of different kinds of synchronous events is realized, and the monitoring of file movement and file renaming is added. Through the classification of synchronous events, the classification of different events, especially the update of file contents and the continuation of events, is realized. A synchronization protocol based on Rsync algorithm is designed to reduce the amount of data transmission between communication parties and improve the synchronization efficiency. According to different synchronization tasks, the optimal data transmission mode is designed, the encrypted data transmission is realized by using HTTPS, and the Hadoop-based cloud storage is used to store the data in the background. In this paper, the hierarchical modularization method is used to design and implement the system, and in the last two chapters of the paper, the functional modules of the system are tested and analyzed, and the research results and the extensible functions of the system are summarized. Finally, the future work is prospected.
【學位授予單位】:電子科技大學
【學位級別】:碩士
【學位授予年份】:2012
【分類號】:TP333
【參考文獻】
相關期刊論文 前9條
1 楊亞平,李偉琴;基于SSL的數(shù)據(jù)安全傳輸系統(tǒng)的設計與實現(xiàn)[J];北京航空航天大學學報;2001年04期
2 鄧波濤;;基于Java的系統(tǒng)網(wǎng)絡編程研究[J];電腦知識與技術;2011年15期
3 林雪云;利用SSL實現(xiàn)數(shù)據(jù)傳輸安全[J];福建電腦;2005年10期
4 魏興國;;HTTP和HTTPS協(xié)議安全性分析[J];程序員;2007年07期
5 趙斌,劉長起,戴英俠;Windows操作系統(tǒng)的文件操作監(jiān)控技術[J];計算機工程與應用;2004年31期
6 劉貝;湯斌;;云存儲原理及發(fā)展趨勢[J];科技信息;2011年05期
7 孟彥;侯整風;;基于SSL/TLS的安全文件傳輸系統(tǒng)[J];計算機技術與發(fā)展;2006年05期
8 谷慶華;李成貴;;Java多線程技術在網(wǎng)絡通信系統(tǒng)中的應用[J];西安外事學院學報;2007年04期
9 周可;王樺;李春花;;云存儲技術及其應用[J];中興通訊技術;2010年04期
相關碩士學位論文 前1條
1 李貞;基于Rsync算法的遠程文件同步系統(tǒng)的設計與實現(xiàn)[D];北京郵電大學;2010年
本文編號:2435552
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2435552.html
最近更新
教材專著