基于中心節(jié)點架構(gòu)的大規(guī)模數(shù)據(jù)對象存儲系統(tǒng)
發(fā)布時間:2018-08-17 15:06
【摘要】:伴隨海量數(shù)據(jù)的到來,大數(shù)據(jù)逐漸進(jìn)入人們視野,大數(shù)據(jù)多樣化,規(guī)模大的特點,使得對象存儲技術(shù)快速普及,正成為一種新的存儲方式。對象存儲為最終用戶提供了統(tǒng)一的存儲空間,每一個對象有唯一的訪問標(biāo)識,該標(biāo)識在對象創(chuàng)建時產(chǎn)生,對象存儲為用戶提供了對象上載PUT和用戶下載GET兩類基本操作為,由于其簡單并易于使用,被廣泛采用。 目前基于中心節(jié)點架構(gòu)的大對象存儲系統(tǒng),如GFS、HDFS,隨著對象規(guī)模的膨脹,其元數(shù)據(jù)規(guī)模也隨之線性增長;單一物理存儲塊配置,無法有效支持小對象的存儲;谥行墓(jié)點架構(gòu)的小對象存儲系統(tǒng),如Haystack,僅支持小對象存儲,并且不能支持多副本并發(fā)。 針對該問題,本文設(shè)計并實現(xiàn)了對象存儲系統(tǒng)LaUDObject,能夠在感知用戶的基礎(chǔ)上,在有效支持大對象存儲的同時,,還能夠高效支持小對象存儲。 論文主要工作包括: (1)為了克服主節(jié)點上對象副本位置表規(guī)模膨脹,通過將多個對象成組,并將成組對象副本統(tǒng)一連續(xù)存儲在某個節(jié)點上,在主節(jié)點中建立對象組副本位置表,從而有效減少了副本位置表的規(guī)模。對象標(biāo)識的中間32位數(shù)值,對應(yīng)其對象組編號,對象標(biāo)識的后32位數(shù)值表示該對象在組內(nèi)的序號。 (2)實現(xiàn)了支持小對象的并發(fā)更新操作的多副本順序一致性策略,能夠有效提高客戶端對象更新效率。 (3)通過將組內(nèi)的小對象合并成為一個大文件,并在外部建立索引的方式,實現(xiàn)了只需要一次磁盤訪問即可完成讀取操作,提升了小對象的訪問速度。 (4)通過在感知用戶標(biāo)識,將對象組與用戶建立關(guān)系,系統(tǒng)能夠?qū)⑼挥脩舻臄?shù)據(jù)進(jìn)行聚集存儲,可以提高系統(tǒng)整體訪問效率。 面向大文件/小文件存儲的多個應(yīng)用場景,對LaUDObject、Hadoop和Cassandra進(jìn)行了性能比較試驗,初步驗證了本文工作的有效性。
[Abstract]:With the arrival of massive data, big data has gradually entered people's field of vision, and the characteristics of big data are diversified and large-scale, which makes the technology of object storage become a new storage method. Object storage provides a uniform storage space for the end user. Each object has a unique access identity, which is generated when the object is created. The object store provides the user with two basic types of operations: object upload PUT and user download GET It is widely used because of its simplicity and ease of use. At present, the large object storage system based on central node architecture, such as GFSN HDFSs, increases linearly with the expansion of object size, and the single physical storage block configuration can not effectively support the storage of small objects. Small object storage systems based on central node architecture, such as Haystack, only support small object storage, and cannot support multi-replica concurrency. To solve this problem, this paper designs and implements the object storage system LaudObject.It can support large object storage effectively and small object storage efficiently on the basis of user awareness. The main work of this paper includes: (1) in order to overcome the expansion of object replica position table on the primary node, the multiple objects are grouped and the group replicas are stored on a node continuously. The replica location table of the target group is established in the primary node, which effectively reduces the scale of the replica location table. The intermediate 32-bit value of the object identification corresponds to the group number of the object, and the latter 32-bit value of the object identifier represents the ordinal number of the object within the group. (2) A multi-replica sequence consistency policy supporting concurrent update operations for small objects is implemented. Can effectively improve the client object update efficiency. (3) by merging small objects in the group into a large file and building an external index, only one disk access is required to complete the read operation. It improves the access speed of small objects. (4) the system can aggregate and store the data of the same user by perceiving the user identification and establishing the relationship between the target group and the user, which can improve the overall access efficiency of the system. For several application scenarios of large file / small file storage, the performance comparison between Laoud object Hadoop and Cassandra is carried out, and the effectiveness of this work is preliminarily verified.
【學(xué)位授予單位】:清華大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:TP333
[Abstract]:With the arrival of massive data, big data has gradually entered people's field of vision, and the characteristics of big data are diversified and large-scale, which makes the technology of object storage become a new storage method. Object storage provides a uniform storage space for the end user. Each object has a unique access identity, which is generated when the object is created. The object store provides the user with two basic types of operations: object upload PUT and user download GET It is widely used because of its simplicity and ease of use. At present, the large object storage system based on central node architecture, such as GFSN HDFSs, increases linearly with the expansion of object size, and the single physical storage block configuration can not effectively support the storage of small objects. Small object storage systems based on central node architecture, such as Haystack, only support small object storage, and cannot support multi-replica concurrency. To solve this problem, this paper designs and implements the object storage system LaudObject.It can support large object storage effectively and small object storage efficiently on the basis of user awareness. The main work of this paper includes: (1) in order to overcome the expansion of object replica position table on the primary node, the multiple objects are grouped and the group replicas are stored on a node continuously. The replica location table of the target group is established in the primary node, which effectively reduces the scale of the replica location table. The intermediate 32-bit value of the object identification corresponds to the group number of the object, and the latter 32-bit value of the object identifier represents the ordinal number of the object within the group. (2) A multi-replica sequence consistency policy supporting concurrent update operations for small objects is implemented. Can effectively improve the client object update efficiency. (3) by merging small objects in the group into a large file and building an external index, only one disk access is required to complete the read operation. It improves the access speed of small objects. (4) the system can aggregate and store the data of the same user by perceiving the user identification and establishing the relationship between the target group and the user, which can improve the overall access efficiency of the system. For several application scenarios of large file / small file storage, the performance comparison between Laoud object Hadoop and Cassandra is carried out, and the effectiveness of this work is preliminarily verified.
【學(xué)位授予單位】:清華大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:TP333
【共引文獻(xiàn)】
相關(guān)期刊論文 前7條
1 趙瑞芬;;云存儲中基于PAXOS算法的數(shù)據(jù)一致性研究[J];科技視界;2013年34期
2 張俊;周新;于素華;高燕;;NoSQL數(shù)據(jù)管理技術(shù)[J];科研信息化技術(shù)與應(yīng)用;2013年01期
3 金培權(quán);郝行軍;岳麗華;;面向新型存儲的大數(shù)據(jù)存儲架構(gòu)與核心算法綜述[J];計算機(jī)工程與科學(xué);2013年10期
4 鐘雨;黃向東;劉丹;黃宇霞;田煒;王建民;;大規(guī)模裝備監(jiān)測數(shù)據(jù)的NoSQL存儲方案[J];計算機(jī)集成制造系統(tǒng);2013年12期
5 范立衡;任祖杰;;基于鍵值存儲的元數(shù)據(jù)集群副本一致性研究[J];杭州電子科技大學(xué)學(xué)報;2014年02期
6 鄧未玲;李強(qiáng);連延W
本文編號:2188028
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2188028.html
最近更新
教材專著