基于MongoDB的海量大中小文件存儲系統(tǒng)的研究與應(yīng)用
本文關(guān)鍵詞: 海量大中小文件 存儲模型 數(shù)據(jù)接口 均衡算法 出處:《中國地質(zhì)大學(xué)(北京)》2016年碩士論文 論文類型:學(xué)位論文
【摘要】:隨著互聯(lián)網(wǎng)的日益發(fā)展和社交網(wǎng)絡(luò)的日益盛行,異構(gòu)網(wǎng)絡(luò)數(shù)據(jù)不斷增加。海量小文件的存儲優(yōu)化已經(jīng)成為了當(dāng)今海量數(shù)據(jù)存儲技術(shù)的一個重要研究方向。對于HDFS、TFS等分布式文件系統(tǒng),在處理海量小文件時不具有一般性。然而,隨著NoSQL技術(shù)的逐漸成熟,它具有的分布式系統(tǒng)的優(yōu)勢、簡單靈活的特點,也使它成為了解決海量小文件存儲的可能。子午工程數(shù)據(jù)中心負(fù)責(zé)處理來自全國各地探測設(shè)備所產(chǎn)生的空間科學(xué)數(shù)據(jù)文件,隨著空間探測數(shù)據(jù)的不斷增加,截至2015年底,子午工程數(shù)據(jù)中心累計匯集科學(xué)數(shù)據(jù)980.18萬個,文件總大小約3.45TB,而且其中90%是100k以下的小文件,其余的是少量中大文件。目前子午工程采用傳統(tǒng)分布式文件系統(tǒng)來存儲科學(xué)數(shù)據(jù),在處理眾多小文件時,會導(dǎo)致磁盤I/O過高,數(shù)據(jù)備份時間過長,數(shù)據(jù)存儲效率較低。針對子午工程數(shù)據(jù)文件特點,充分分析了當(dāng)今主流海量數(shù)據(jù)存儲方案的優(yōu)缺點,在MongoDB的基礎(chǔ)上,本文提出了一種ZW-Mongo存儲模型。該存儲模型主要包括三方面設(shè)計:(1)利用MongoDB的BSON數(shù)據(jù)結(jié)構(gòu)特性,直接處理小文件存儲,提高小文件存儲效率;(2)針對大文件分塊存儲,并構(gòu)建元信息集合和塊數(shù)據(jù)集合;(3)采用歷史版本和軟刪除的方式,提高文件利用率。ZW-Mongo存儲模型提高了小文件的存儲和訪問效率,有效地降低了文件的管理成本。通過分析MongoDB數(shù)據(jù)均衡策略的缺點,本文提出了基于一致性哈希的數(shù)據(jù)均衡策略,并構(gòu)建了基于一致性哈希算法的文件存儲過程。本文基于ZW-Mongo存儲模型,設(shè)計并研發(fā)了一套REST風(fēng)格的數(shù)據(jù)訪問接口,同時實現(xiàn)了數(shù)據(jù)均衡算法的訪問接口,便于數(shù)據(jù)節(jié)點的添加和移除。最后,通過ZW-Mongo存儲模型數(shù)據(jù)接口與傳統(tǒng)分布式文件系統(tǒng)對比測試表明,ZW-Mongo存儲模型在數(shù)據(jù)讀取、查詢、備份等方面均優(yōu)于傳統(tǒng)存儲模式,在數(shù)據(jù)寫入方面兩者基本相似,同時通過添加虛擬節(jié)點的數(shù)據(jù)均衡測試表明,添加虛擬節(jié)點的數(shù)量可以促進數(shù)據(jù)節(jié)點之間的均衡分布。ZW-Mongo存儲模型已實際應(yīng)用于子午工程數(shù)據(jù)中心的數(shù)據(jù)存儲子系統(tǒng)中,應(yīng)用效果良好。
[Abstract]:With the development of the Internet and the growing popularity of social networks, The storage optimization of large amount of small files has become an important research direction of mass data storage technology. For distributed file systems such as HDFS / TFS, it is not general when dealing with large amounts of small files. With the maturity of NoSQL technology, it has the advantages of distributed system, simple and flexible. It also makes it possible to store large amounts of small files. Meridian Engineering data Center is responsible for processing space science data files generated by exploration equipment from all over the country. As space exploration data continues to increase, as of end of 2015, Meridian Engineering data Center accumulates 980.18 million scientific data, and the total file size is about 3.45 TB.And 90% of them are small files below 100k, and the rest are a small number of medium and large files. At present, the Meridian Project uses traditional distributed file systems to store scientific data. When dealing with many small files, the disk I / O is too high, data backup time is too long, and data storage efficiency is low. According to the characteristics of meridian engineering data files, the advantages and disadvantages of current mainstream massive data storage schemes are analyzed. On the basis of MongoDB, this paper proposes a ZW-Mongo storage model. The storage model includes three aspects: design: 1) using the BSON data structure of MongoDB to deal with small file storage directly and improve the efficiency of small file storage. Using the historical version and soft delete method, we can improve the file utilization. ZW-Mongo storage model improves the storage and access efficiency of small files. By analyzing the shortcomings of MongoDB data equalization strategy, this paper proposes a data equalization strategy based on consistency hash. Based on the ZW-Mongo storage model, a set of REST style data access interface is designed and developed in this paper. At the same time, the data equalization algorithm access interface is realized. Finally, by comparing the ZW-Mongo storage model data interface with the traditional distributed file system, it shows that the ZW-Mongo storage model is superior to the traditional storage mode in data reading, query, backup and so on. In the aspect of data writing, they are basically similar. At the same time, the data equalization test of adding virtual nodes shows that, The addition of the number of virtual nodes can promote the balanced distribution of data nodes. ZW-Mongo storage model has been applied to the data storage subsystem of meridian engineering data center and the application effect is good.
【學(xué)位授予單位】:中國地質(zhì)大學(xué)(北京)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2016
【分類號】:TP333
【參考文獻】
相關(guān)期刊論文 前9條
1 張艷霞;豐繼林;郝偉;單維鋒;沈焱萍;;基于NoSQL的文件型大數(shù)據(jù)存儲技術(shù)研究[J];制造業(yè)自動化;2014年06期
2 王魯俊;龍翔;吳興博;王雷;;SFFS:低延遲的面向小文件的分布式文件系統(tǒng)[J];計算機科學(xué)與探索;2014年04期
3 陳明;;NoSQL數(shù)據(jù)庫系統(tǒng)[J];計算機教育;2013年11期
4 付松齡;廖湘科;黃辰林;王蕾;李姍姍;;FlatLFS:一種面向海量小文件處理優(yōu)化的輕量級文件系統(tǒng)[J];國防科技大學(xué)學(xué)報;2013年02期
5 姚墨涵;謝紅薇;;一致性哈希算法在分布式系統(tǒng)中的應(yīng)用[J];電腦開發(fā)與應(yīng)用;2012年07期
6 馬燦;孟丹;熊勁;;曙光星云分布式文件系統(tǒng):海量小文件存取[J];小型微型計算機系統(tǒng);2012年07期
7 楊_g劍;林波;;分布式存儲系統(tǒng)中一致性哈希算法的研究[J];電腦知識與技術(shù);2011年22期
8 王赤;馮學(xué)尚;萬衛(wèi)星;騰云田;竇賢康;史建魁;袁慶智;;東半球空間環(huán)境地基綜合監(jiān)測子午鏈簡介[J];國際地震動態(tài);2009年06期
9 彭明軍,李宗華,楊存吉;WebGIS實現(xiàn)技術(shù)及發(fā)展研究[J];測繪信息與工程;2001年01期
相關(guān)碩士學(xué)位論文 前3條
1 張呈;Hadoop集群下海量小文件優(yōu)化處理[D];武漢理工大學(xué);2014年
2 李東升;基于Chord環(huán)的MongoDB數(shù)據(jù)均衡系統(tǒng)設(shè)計與實現(xiàn)[D];重慶大學(xué);2013年
3 郭匡宇;基于MongoDB的傳感器數(shù)據(jù)分布式存儲的研究與應(yīng)用[D];南京郵電大學(xué);2013年
,本文編號:1519015
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1519015.html