分布式文件系統(tǒng)數(shù)據(jù)讀寫(xiě)流程分析與優(yōu)化
發(fā)布時(shí)間:2018-08-14 14:54
【摘要】:大數(shù)據(jù)時(shí)代存儲(chǔ)系統(tǒng)在眾多實(shí)際應(yīng)用中扮演越來(lái)越重要的角色,其讀寫(xiě)性能的好壞直接影響上層應(yīng)用的性能。目前,分布式文件系統(tǒng)都是利用擴(kuò)展性支持不斷攀升的性能需求,但規(guī)模擴(kuò)大易導(dǎo)致成本增加、維護(hù)困難。雖然基于對(duì)象的文件系統(tǒng)利用了存儲(chǔ)設(shè)備的智能性,,但卻忽視了存儲(chǔ)系統(tǒng)中所有組件是一個(gè)有機(jī)的整體。存儲(chǔ)系統(tǒng)性能好壞的關(guān)鍵在于能否充分發(fā)揮系統(tǒng)中各個(gè)節(jié)點(diǎn)的優(yōu)勢(shì)和充分利用節(jié)點(diǎn)間的互聯(lián)網(wǎng)絡(luò)。 著重研究了存儲(chǔ)系統(tǒng)中的數(shù)據(jù)讀寫(xiě)流程,并對(duì)影響系統(tǒng)性能的關(guān)鍵步驟進(jìn)行了優(yōu)化。所做工作全部在實(shí)驗(yàn)室研發(fā)的基于對(duì)象的分布式文件系統(tǒng)Cappella中實(shí)現(xiàn)并完成測(cè)試。 針對(duì)數(shù)據(jù)寫(xiě)流程,設(shè)計(jì)并實(shí)現(xiàn)了根據(jù)存儲(chǔ)服務(wù)器實(shí)時(shí)負(fù)載的動(dòng)態(tài)布局方案。每個(gè)存儲(chǔ)服務(wù)器都有一個(gè)實(shí)時(shí)權(quán)重表示其忙閑程度,在文件布局時(shí),根據(jù)所有存儲(chǔ)服務(wù)器的實(shí)時(shí)負(fù)載進(jìn)行有偏重的隨機(jī)選擇,成功地解決了Cappella系統(tǒng)靜態(tài)布局容易造成負(fù)載不均衡的問(wèn)題。 針對(duì)數(shù)據(jù)讀流程,詳細(xì)分析了Linux內(nèi)核原有數(shù)據(jù)預(yù)取算法,針對(duì)Linux原有數(shù)據(jù)預(yù)取算法的缺點(diǎn),設(shè)計(jì)并實(shí)現(xiàn)了一種適用于分布式環(huán)境的數(shù)據(jù)預(yù)取策略。Linux中的預(yù)取算法是針對(duì)本地文件系統(tǒng)和磁盤(pán)作為存儲(chǔ)設(shè)備的限制提出的,在分布式環(huán)境中顯得不足。分布式環(huán)境下數(shù)據(jù)分布在通過(guò)專用高速網(wǎng)絡(luò)互聯(lián)的多個(gè)節(jié)點(diǎn)中,因此節(jié)點(diǎn)間的互聯(lián)網(wǎng)絡(luò)和數(shù)據(jù)在多個(gè)節(jié)點(diǎn)上的分布方式成為優(yōu)化系統(tǒng)性能的關(guān)鍵,分布式環(huán)境下的預(yù)取算法綜合考慮了網(wǎng)絡(luò)傳輸?shù)南拗坪蛿?shù)據(jù)分布的特點(diǎn),有效地提升了系統(tǒng)性能。 測(cè)試結(jié)果表明,數(shù)據(jù)能在各個(gè)存儲(chǔ)服務(wù)器上按服務(wù)器權(quán)重合理分布,讀帶寬在順序訪問(wèn)和大塊的隨機(jī)訪問(wèn)情況下可以提高30%以上,最高近90%。
[Abstract]:In the era of big data, storage system plays an increasingly important role in many practical applications, and its reading and writing performance directly affects the performance of upper application. At present, distributed file systems are always using extensibility to support increasing performance requirements, but the expansion of scale can easily lead to increased costs and difficult maintenance. Although the object-based file system takes advantage of the intelligence of the storage device, it ignores that all the components in the storage system are an organic whole. The key to the performance of the storage system lies in whether it can give full play to the advantages of each node in the system and make full use of the Internet between the nodes. The data reading and writing process in storage system is studied, and the key steps that affect the system performance are optimized. All the work is implemented and tested in the object-based distributed file system (Cappella) developed in the laboratory. According to the data writing process, the dynamic layout scheme based on the real-time load of storage server is designed and implemented. Each storage server has a real-time weight to indicate its busy degree. In the file layout, it is selected randomly according to the real-time load of all storage servers. The problem of load imbalance caused by static layout of Cappella system is solved successfully. According to the data reading process, this paper analyzes the original data prefetching algorithm of Linux kernel in detail, and aims at the shortcoming of Linux original data prefetching algorithm. This paper designs and implements a data prefetching strategy for distributed environment. The prefetching algorithm in Linux is aimed at the limitation of local file system and disk as storage device, which is insufficient in distributed environment. In the distributed environment, the data is distributed among multiple nodes interconnected by a dedicated high-speed network, so the internetwork between nodes and the distribution of data on multiple nodes become the key to optimize the performance of the system. The prefetching algorithm in distributed environment takes into account the limitations of network transmission and the characteristics of data distribution, and improves the performance of the system effectively. The test results show that the data can be distributed reasonably according to the server weight on each storage server, and the read bandwidth can be increased by more than 30% in the case of sequential access and large random access, and the maximum is nearly 90%.
【學(xué)位授予單位】:華中科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類號(hào)】:TP333
本文編號(hào):2183215
[Abstract]:In the era of big data, storage system plays an increasingly important role in many practical applications, and its reading and writing performance directly affects the performance of upper application. At present, distributed file systems are always using extensibility to support increasing performance requirements, but the expansion of scale can easily lead to increased costs and difficult maintenance. Although the object-based file system takes advantage of the intelligence of the storage device, it ignores that all the components in the storage system are an organic whole. The key to the performance of the storage system lies in whether it can give full play to the advantages of each node in the system and make full use of the Internet between the nodes. The data reading and writing process in storage system is studied, and the key steps that affect the system performance are optimized. All the work is implemented and tested in the object-based distributed file system (Cappella) developed in the laboratory. According to the data writing process, the dynamic layout scheme based on the real-time load of storage server is designed and implemented. Each storage server has a real-time weight to indicate its busy degree. In the file layout, it is selected randomly according to the real-time load of all storage servers. The problem of load imbalance caused by static layout of Cappella system is solved successfully. According to the data reading process, this paper analyzes the original data prefetching algorithm of Linux kernel in detail, and aims at the shortcoming of Linux original data prefetching algorithm. This paper designs and implements a data prefetching strategy for distributed environment. The prefetching algorithm in Linux is aimed at the limitation of local file system and disk as storage device, which is insufficient in distributed environment. In the distributed environment, the data is distributed among multiple nodes interconnected by a dedicated high-speed network, so the internetwork between nodes and the distribution of data on multiple nodes become the key to optimize the performance of the system. The prefetching algorithm in distributed environment takes into account the limitations of network transmission and the characteristics of data distribution, and improves the performance of the system effectively. The test results show that the data can be distributed reasonably according to the server weight on each storage server, and the read bandwidth can be increased by more than 30% in the case of sequential access and large random access, and the maximum is nearly 90%.
【學(xué)位授予單位】:華中科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類號(hào)】:TP333
【參考文獻(xiàn)】
相關(guān)期刊論文 前3條
1 萇練莉,李鴻培,徐甲同;負(fù)載信息采集系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[J];電子科技;1999年13期
2 傅湘林;謝長(zhǎng)生;曹強(qiáng);劉朝斌;;一種融合NAS和SAN技術(shù)的存儲(chǔ)網(wǎng)絡(luò)系統(tǒng)[J];計(jì)算機(jī)科學(xué);2003年02期
3 孟小峰;慈祥;;大數(shù)據(jù)管理:概念、技術(shù)與挑戰(zhàn)[J];計(jì)算機(jī)研究與發(fā)展;2013年01期
相關(guān)博士學(xué)位論文 前1條
1 吳峰光;Linux內(nèi)核中的預(yù)取算法[D];中國(guó)科學(xué)技術(shù)大學(xué);2008年
本文編號(hào):2183215
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2183215.html
最近更新
教材專著