分布式文件系統(tǒng)數(shù)據(jù)讀寫流程分析與優(yōu)化
發(fā)布時間:2018-08-14 14:54
【摘要】:大數(shù)據(jù)時代存儲系統(tǒng)在眾多實際應(yīng)用中扮演越來越重要的角色,其讀寫性能的好壞直接影響上層應(yīng)用的性能。目前,分布式文件系統(tǒng)都是利用擴展性支持不斷攀升的性能需求,但規(guī)模擴大易導(dǎo)致成本增加、維護困難。雖然基于對象的文件系統(tǒng)利用了存儲設(shè)備的智能性,,但卻忽視了存儲系統(tǒng)中所有組件是一個有機的整體。存儲系統(tǒng)性能好壞的關(guān)鍵在于能否充分發(fā)揮系統(tǒng)中各個節(jié)點的優(yōu)勢和充分利用節(jié)點間的互聯(lián)網(wǎng)絡(luò)。 著重研究了存儲系統(tǒng)中的數(shù)據(jù)讀寫流程,并對影響系統(tǒng)性能的關(guān)鍵步驟進行了優(yōu)化。所做工作全部在實驗室研發(fā)的基于對象的分布式文件系統(tǒng)Cappella中實現(xiàn)并完成測試。 針對數(shù)據(jù)寫流程,設(shè)計并實現(xiàn)了根據(jù)存儲服務(wù)器實時負載的動態(tài)布局方案。每個存儲服務(wù)器都有一個實時權(quán)重表示其忙閑程度,在文件布局時,根據(jù)所有存儲服務(wù)器的實時負載進行有偏重的隨機選擇,成功地解決了Cappella系統(tǒng)靜態(tài)布局容易造成負載不均衡的問題。 針對數(shù)據(jù)讀流程,詳細分析了Linux內(nèi)核原有數(shù)據(jù)預(yù)取算法,針對Linux原有數(shù)據(jù)預(yù)取算法的缺點,設(shè)計并實現(xiàn)了一種適用于分布式環(huán)境的數(shù)據(jù)預(yù)取策略。Linux中的預(yù)取算法是針對本地文件系統(tǒng)和磁盤作為存儲設(shè)備的限制提出的,在分布式環(huán)境中顯得不足。分布式環(huán)境下數(shù)據(jù)分布在通過專用高速網(wǎng)絡(luò)互聯(lián)的多個節(jié)點中,因此節(jié)點間的互聯(lián)網(wǎng)絡(luò)和數(shù)據(jù)在多個節(jié)點上的分布方式成為優(yōu)化系統(tǒng)性能的關(guān)鍵,分布式環(huán)境下的預(yù)取算法綜合考慮了網(wǎng)絡(luò)傳輸?shù)南拗坪蛿?shù)據(jù)分布的特點,有效地提升了系統(tǒng)性能。 測試結(jié)果表明,數(shù)據(jù)能在各個存儲服務(wù)器上按服務(wù)器權(quán)重合理分布,讀帶寬在順序訪問和大塊的隨機訪問情況下可以提高30%以上,最高近90%。
[Abstract]:In the era of big data, storage system plays an increasingly important role in many practical applications, and its reading and writing performance directly affects the performance of upper application. At present, distributed file systems are always using extensibility to support increasing performance requirements, but the expansion of scale can easily lead to increased costs and difficult maintenance. Although the object-based file system takes advantage of the intelligence of the storage device, it ignores that all the components in the storage system are an organic whole. The key to the performance of the storage system lies in whether it can give full play to the advantages of each node in the system and make full use of the Internet between the nodes. The data reading and writing process in storage system is studied, and the key steps that affect the system performance are optimized. All the work is implemented and tested in the object-based distributed file system (Cappella) developed in the laboratory. According to the data writing process, the dynamic layout scheme based on the real-time load of storage server is designed and implemented. Each storage server has a real-time weight to indicate its busy degree. In the file layout, it is selected randomly according to the real-time load of all storage servers. The problem of load imbalance caused by static layout of Cappella system is solved successfully. According to the data reading process, this paper analyzes the original data prefetching algorithm of Linux kernel in detail, and aims at the shortcoming of Linux original data prefetching algorithm. This paper designs and implements a data prefetching strategy for distributed environment. The prefetching algorithm in Linux is aimed at the limitation of local file system and disk as storage device, which is insufficient in distributed environment. In the distributed environment, the data is distributed among multiple nodes interconnected by a dedicated high-speed network, so the internetwork between nodes and the distribution of data on multiple nodes become the key to optimize the performance of the system. The prefetching algorithm in distributed environment takes into account the limitations of network transmission and the characteristics of data distribution, and improves the performance of the system effectively. The test results show that the data can be distributed reasonably according to the server weight on each storage server, and the read bandwidth can be increased by more than 30% in the case of sequential access and large random access, and the maximum is nearly 90%.
【學(xué)位授予單位】:華中科技大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:TP333
本文編號:2183215
[Abstract]:In the era of big data, storage system plays an increasingly important role in many practical applications, and its reading and writing performance directly affects the performance of upper application. At present, distributed file systems are always using extensibility to support increasing performance requirements, but the expansion of scale can easily lead to increased costs and difficult maintenance. Although the object-based file system takes advantage of the intelligence of the storage device, it ignores that all the components in the storage system are an organic whole. The key to the performance of the storage system lies in whether it can give full play to the advantages of each node in the system and make full use of the Internet between the nodes. The data reading and writing process in storage system is studied, and the key steps that affect the system performance are optimized. All the work is implemented and tested in the object-based distributed file system (Cappella) developed in the laboratory. According to the data writing process, the dynamic layout scheme based on the real-time load of storage server is designed and implemented. Each storage server has a real-time weight to indicate its busy degree. In the file layout, it is selected randomly according to the real-time load of all storage servers. The problem of load imbalance caused by static layout of Cappella system is solved successfully. According to the data reading process, this paper analyzes the original data prefetching algorithm of Linux kernel in detail, and aims at the shortcoming of Linux original data prefetching algorithm. This paper designs and implements a data prefetching strategy for distributed environment. The prefetching algorithm in Linux is aimed at the limitation of local file system and disk as storage device, which is insufficient in distributed environment. In the distributed environment, the data is distributed among multiple nodes interconnected by a dedicated high-speed network, so the internetwork between nodes and the distribution of data on multiple nodes become the key to optimize the performance of the system. The prefetching algorithm in distributed environment takes into account the limitations of network transmission and the characteristics of data distribution, and improves the performance of the system effectively. The test results show that the data can be distributed reasonably according to the server weight on each storage server, and the read bandwidth can be increased by more than 30% in the case of sequential access and large random access, and the maximum is nearly 90%.
【學(xué)位授予單位】:華中科技大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:TP333
【參考文獻】
相關(guān)期刊論文 前3條
1 萇練莉,李鴻培,徐甲同;負載信息采集系統(tǒng)的設(shè)計與實現(xiàn)[J];電子科技;1999年13期
2 傅湘林;謝長生;曹強;劉朝斌;;一種融合NAS和SAN技術(shù)的存儲網(wǎng)絡(luò)系統(tǒng)[J];計算機科學(xué);2003年02期
3 孟小峰;慈祥;;大數(shù)據(jù)管理:概念、技術(shù)與挑戰(zhàn)[J];計算機研究與發(fā)展;2013年01期
相關(guān)博士學(xué)位論文 前1條
1 吳峰光;Linux內(nèi)核中的預(yù)取算法[D];中國科學(xué)技術(shù)大學(xué);2008年
本文編號:2183215
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2183215.html
最近更新
教材專著