并行文件存儲系統(tǒng)關鍵技術的研究
發(fā)布時間:2018-10-25 13:15
【摘要】:隨著互聯(lián)網(wǎng)的發(fā)展、信息化電子化水平的不斷提升,數(shù)據(jù)也呈現(xiàn)爆炸性的增長趨勢。雖然傳統(tǒng)單機存儲技術的容量和性能在過去幾十年取得非常大的發(fā)展,但是面對海量的數(shù)據(jù),單機存儲技術仍然力不從心。于是,如何構(gòu)建一個高性能、大容量、高可靠性與高可擴展性的數(shù)據(jù)存儲系統(tǒng)成為一個重要的問題,在這種大背景下,分布式并行文件存儲系統(tǒng)應運而生。 分布式并行文件存儲系統(tǒng)是目前計算機學術界與企業(yè)界的一個研究熱點,各研究機構(gòu)與企業(yè)也已經(jīng)取得不少成果。但是這些研究機構(gòu)與企業(yè)所推出的產(chǎn)品,大多是針對自身業(yè)務需求設計,具有相當大的局限性與不足,,還存在非常大的研究與改進空間。本文主要工作如下: (1)對比分析了GFS、Global File System等目前主流分布式文件存儲系統(tǒng),總結(jié)了它們的優(yōu)勢與不足,并提出一種新的分布式文件系統(tǒng)架構(gòu)與扁平化文件組織形式。 (2)設計了一種基于Hash表的索引結(jié)構(gòu)與一種基于一致性Hash算法的擴展機制,并且通過模擬測試驗證了一致性Hash算法具有比傳統(tǒng)Hash取模算法更好的擴展性。 (3)通過分析Linux文件系統(tǒng)的實現(xiàn)原理與細節(jié),揭示了其在海量文件存儲上的不足,在此基礎上設計了一種基于合并機制的存儲節(jié)點數(shù)據(jù)存儲方案,并作了詳細描述,最后通過實驗驗證了該方案具有比直接基于文件系統(tǒng)的存儲方式更好的讀寫性能。 (4)分析了導致系統(tǒng)負載失衡的兩個原因:客戶端的訪問負載不均問題和熱數(shù)據(jù)問題。針對前一個原因,本文提出了一種基于服務器負載模型與節(jié)點靜態(tài)性能相結(jié)合的負載均衡策略,對客戶端的訪問負載進行均衡;針對后一個原因,本文提出了一種基于數(shù)據(jù)熱度統(tǒng)計的副本數(shù)量管理策略,使熱數(shù)據(jù)的副本數(shù)量動態(tài)增加,達到把負載分攤到多個節(jié)點的目的。
[Abstract]:With the development of the Internet and the constant improvement of the electronic level of information, the data also present an explosive growth trend. Although the capacity and performance of the traditional single-machine storage technology have been greatly developed in the past few decades, the single-machine storage technology is still unable to cope with the huge amount of data. Therefore, how to build a high performance, large capacity, high reliability and high scalability data storage system has become an important problem. Under this background, distributed parallel file storage system came into being. Distributed parallel file storage system is a hot research topic in computer academic and business circles at present, and many research institutions and enterprises have also made a lot of achievements. However, most of the products introduced by these research institutions and enterprises are designed according to their own business requirements, which have considerable limitations and shortcomings, and there is still a lot of room for research and improvement. The main work of this paper is as follows: (1) the main distributed file storage systems, such as GFS,Global File System, are compared and analyzed, and their advantages and disadvantages are summarized. A new distributed file system architecture and flat file organization are proposed. (2) an index structure based on Hash table and an extension mechanism based on consistent Hash algorithm are designed. The simulation results show that the consistent Hash algorithm is more scalable than the traditional Hash algorithm. (3) by analyzing the implementation principle and details of the Linux file system, this paper reveals its shortcomings in mass file storage. On this basis, a storage node data storage scheme based on merge mechanism is designed and described in detail. Finally, the experimental results show that the proposed scheme has better read and write performance than the direct file system-based storage. (4) the two causes of the system load imbalance are analyzed: the problem of uneven access to the client and the hot data problem. For the former reason, this paper proposes a load balancing strategy based on the combination of server load model and node static performance to balance the access load of the client. In this paper, a replica quantity management strategy based on data heat statistics is proposed, which can dynamically increase the number of replicas of thermal data and achieve the purpose of distributing the load to multiple nodes.
【學位授予單位】:華南理工大學
【學位級別】:碩士
【學位授予年份】:2012
【分類號】:TP333
本文編號:2293800
[Abstract]:With the development of the Internet and the constant improvement of the electronic level of information, the data also present an explosive growth trend. Although the capacity and performance of the traditional single-machine storage technology have been greatly developed in the past few decades, the single-machine storage technology is still unable to cope with the huge amount of data. Therefore, how to build a high performance, large capacity, high reliability and high scalability data storage system has become an important problem. Under this background, distributed parallel file storage system came into being. Distributed parallel file storage system is a hot research topic in computer academic and business circles at present, and many research institutions and enterprises have also made a lot of achievements. However, most of the products introduced by these research institutions and enterprises are designed according to their own business requirements, which have considerable limitations and shortcomings, and there is still a lot of room for research and improvement. The main work of this paper is as follows: (1) the main distributed file storage systems, such as GFS,Global File System, are compared and analyzed, and their advantages and disadvantages are summarized. A new distributed file system architecture and flat file organization are proposed. (2) an index structure based on Hash table and an extension mechanism based on consistent Hash algorithm are designed. The simulation results show that the consistent Hash algorithm is more scalable than the traditional Hash algorithm. (3) by analyzing the implementation principle and details of the Linux file system, this paper reveals its shortcomings in mass file storage. On this basis, a storage node data storage scheme based on merge mechanism is designed and described in detail. Finally, the experimental results show that the proposed scheme has better read and write performance than the direct file system-based storage. (4) the two causes of the system load imbalance are analyzed: the problem of uneven access to the client and the hot data problem. For the former reason, this paper proposes a load balancing strategy based on the combination of server load model and node static performance to balance the access load of the client. In this paper, a replica quantity management strategy based on data heat statistics is proposed, which can dynamically increase the number of replicas of thermal data and achieve the purpose of distributing the load to multiple nodes.
【學位授予單位】:華南理工大學
【學位級別】:碩士
【學位授予年份】:2012
【分類號】:TP333
【參考文獻】
相關期刊論文 前6條
1 熊勁,范志華,馬捷,唐榮鋒,李暉,孟丹;DCFS2的元數(shù)據(jù)一致性策略[J];計算機研究與發(fā)展;2005年06期
2 吳偉;謝長生;韓德志;黃建忠;;海量存儲系統(tǒng)中高可擴展性元數(shù)據(jù)服務器集群設計[J];計算機科學;2007年07期
3 龐麗萍,何飛躍,徐婕,岳建輝;PVFS寄生式元數(shù)據(jù)管理的設計與實現(xiàn)[J];計算機工程;2004年20期
4 楊德志;許魯;張建剛;;藍鯨分布式文件系統(tǒng)元數(shù)據(jù)服務[J];計算機工程;2008年07期
5 趙旺;曹強;;分布式并行文件系統(tǒng)中鎖管理的研究[J];計算機應用研究;2007年09期
6 張曉春;劉引;;淺談分布式文件系統(tǒng)關鍵技術[J];科學咨詢(決策管理);2009年04期
相關博士學位論文 前2條
1 王建勇;可擴展的單一映象文件系統(tǒng)[D];中國科學院研究生院(計算技術研究所);1999年
2 吳思寧;機群文件系統(tǒng)服務器關鍵技術研究[D];中國科學院研究生院(計算技術研究所);2004年
相關碩士學位論文 前1條
1 田穎;分布式文件系統(tǒng)中的負載平衡技術研究[D];中國科學院研究生院(計算技術研究所);2003年
本文編號:2293800
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2293800.html
最近更新
教材專著