集群存儲(chǔ)網(wǎng)絡(luò)吞吐量塌陷行為研究
發(fā)布時(shí)間:2018-07-07 22:34
本文選題:集群存儲(chǔ) + 網(wǎng)絡(luò)吞吐率 ; 參考:《華中科技大學(xué)》2012年博士論文
【摘要】:集群存儲(chǔ)系統(tǒng)因其低成本和易于擴(kuò)展等優(yōu)勢(shì)在云存儲(chǔ)時(shí)代的數(shù)據(jù)中心中得到廣泛應(yīng)用。數(shù)據(jù)中心將集群存儲(chǔ)系統(tǒng)構(gòu)建在高速TCP以太網(wǎng)上,多個(gè)存儲(chǔ)節(jié)點(diǎn)同時(shí)對(duì)外提供數(shù)據(jù)存取服務(wù)。由于集群存儲(chǔ)網(wǎng)絡(luò)負(fù)載的同步并發(fā)特性,在響應(yīng)客戶端的數(shù)據(jù)請(qǐng)求時(shí),客戶端帶寬利用率隨著集群節(jié)點(diǎn)數(shù)目的增加急劇下降,造成客戶端的實(shí)際網(wǎng)絡(luò)吞吐率只有正常情況下的20%左右,極大浪費(fèi)了數(shù)據(jù)中心的網(wǎng)絡(luò)帶寬資源,也增加了客戶端的數(shù)據(jù)存取延時(shí)。這種集群存儲(chǔ)網(wǎng)絡(luò)中的多對(duì)一通信模式下的TCP吞吐率塌陷行為被研究人員稱為Incast。 首先,通過(guò)模擬和真實(shí)集群環(huán)境下的實(shí)驗(yàn)測(cè)試再現(xiàn)了集群存儲(chǔ)網(wǎng)絡(luò)的Incast現(xiàn)象,從微觀和宏觀角度說(shuō)明了Incast現(xiàn)象在集群存儲(chǔ)網(wǎng)絡(luò)中是普遍存在的,通過(guò)量化建模方式分析了其形成原因。根據(jù)實(shí)驗(yàn)trace的分析指出造成Incast的主要原因是TCP的超時(shí),而現(xiàn)有的擁塞控制機(jī)制和TCP協(xié)議實(shí)現(xiàn)不能充分發(fā)揮其在集群存儲(chǔ)網(wǎng)絡(luò)環(huán)境中的優(yōu)勢(shì),數(shù)據(jù)存儲(chǔ)策略和應(yīng)用負(fù)載的高并發(fā)性加劇了集群存儲(chǔ)網(wǎng)絡(luò)中的TCP超時(shí),亦即加劇了Incast對(duì)實(shí)際吞吐率的負(fù)面影響。通過(guò)量化建模分析,回答了為什么現(xiàn)有的TCP擁塞控制機(jī)制不能在集群存儲(chǔ)網(wǎng)絡(luò)中發(fā)揮優(yōu)勢(shì),闡述了實(shí)際網(wǎng)絡(luò)吞吐率急劇下降的原因是瞬態(tài)的爆發(fā)性丟包造成的連續(xù)超時(shí),并可根據(jù)量化評(píng)估模型估算發(fā)生超時(shí)的概率和實(shí)際吞吐率。這些研究都給下一步探索Incast的優(yōu)化與解決方法提供了理論依據(jù)。 其次,根據(jù)對(duì)Incast的形成原因的量化分析,對(duì)TCP最小重傳超時(shí)計(jì)時(shí)器的實(shí)現(xiàn)進(jìn)行了優(yōu)化,一方面防止過(guò)大的RTOmin帶來(lái)的TCP超時(shí)影響;另一方面也避免過(guò)小的RTOmin引發(fā)的TCP偽造重傳,F(xiàn)有的TCP擁塞控制機(jī)制因?yàn)槠淦者m性,最小超時(shí)重傳計(jì)時(shí)器RTOmin在協(xié)議實(shí)現(xiàn)中精度設(shè)置過(guò)低,不能滿足現(xiàn)在的高速集群存儲(chǔ)網(wǎng)絡(luò)環(huán)境的需要。在Linux-2.6.18以后版本的內(nèi)核中,由于加入了內(nèi)核對(duì)高精度時(shí)鐘的支持,通過(guò)優(yōu)化TCP協(xié)議實(shí)現(xiàn)中的RTOmin,平衡了TCP超時(shí)和偽造重傳對(duì)集群存儲(chǔ)網(wǎng)絡(luò)吞吐率的影響。 再次,在應(yīng)用層采用負(fù)載控制措施,限制各個(gè)存儲(chǔ)節(jié)點(diǎn)在同步讀時(shí)的突發(fā)負(fù)載速率,避免集群存儲(chǔ)網(wǎng)絡(luò)傳輸中的瞬態(tài)爆發(fā)性丟包造成的TCP超時(shí),從而解決了Incast問(wèn)題。在Linux內(nèi)核的網(wǎng)絡(luò)接口負(fù)載控制模塊支持下,通過(guò)控制腳本實(shí)現(xiàn)負(fù)載控制參數(shù)的傳遞,限制了多存儲(chǔ)節(jié)點(diǎn)的同步并發(fā)傳輸?shù)淖畲筘?fù)載速率,預(yù)防網(wǎng)絡(luò)擁塞狀況的產(chǎn)生,,從而避免了瞬態(tài)的爆發(fā)性丟包事件造成的多次TCP超時(shí)。負(fù)載控制策略的核心思想是使得參與同步傳輸?shù)亩鄠(gè)存儲(chǔ)節(jié)點(diǎn)均等占有網(wǎng)路瓶頸鏈路的帶寬資源,亦即每個(gè)同步傳輸流的最大負(fù)載速率不能超過(guò)其在集群中應(yīng)該分配的平均帶寬。最后,對(duì)具有典型Incast負(fù)載特性的分布式連續(xù)數(shù)據(jù)保護(hù)系統(tǒng),分析了其網(wǎng)絡(luò)負(fù)載的Incast行為特性和本地磁盤的負(fù)載特性。針對(duì)網(wǎng)絡(luò)Incast,采用了RTOmin優(yōu)化和 負(fù)載控制相結(jié)合的手段提高客戶端實(shí)際吞吐率,降低數(shù)據(jù)請(qǐng)求的網(wǎng)絡(luò)傳輸時(shí)間。對(duì)于校驗(yàn)服務(wù)器的本地磁盤負(fù)載采用緩沖鏈條的策略進(jìn)行優(yōu)化,降低校驗(yàn)服務(wù)器對(duì)于本地磁盤的IO次數(shù),減少數(shù)據(jù)校驗(yàn)計(jì)算的等待時(shí)間。通過(guò)兩方面的優(yōu)化,提升集群存儲(chǔ)系統(tǒng)的網(wǎng)絡(luò)傳輸效率和本地IO性能,降低總的用戶數(shù)據(jù)服務(wù)的響應(yīng)時(shí)間。本文通過(guò)對(duì)Incast形成原因的分析和解決方法的研究,為大規(guī)模企業(yè)數(shù)據(jù)中心的高質(zhì)量存儲(chǔ)服務(wù)提供了保障。
[Abstract]:The cluster storage system is widely used in the data center of the cloud storage age because of its advantages of low cost and easy extension. The data center constructs the cluster storage system in the high-speed TCP Ethernet network, and multiple storage nodes provide the data access service at the same time. The response to the customer is in response to the synchronization and concurrency of the load of the cluster storage network. When the end of the data request, the client bandwidth utilization rate decreases sharply with the increase of the number of cluster nodes, resulting in the actual network throughput of the client only about 20% under normal conditions, which greatly wastes the network bandwidth resources of the data center and increases the data storage delay of the client. The collapse behavior of TCP throughput under the letter mode is called Incast. by researchers.
First, the Incast phenomenon of cluster storage network is reproduced by simulation and experimental test in real cluster environment. From the micro and macro point of view, the Incast phenomenon is common in the cluster storage network. The cause of its formation is analyzed by the quantitative modeling method. The main reasons for the cause of Incast are pointed out according to the analysis of real trace. It is the timeout of TCP, and the existing congestion control mechanism and the implementation of TCP protocol can not give full play to its advantages in the cluster storage network environment. The high concurrency of the data storage strategy and the application load aggravates the TCP timeout in the cluster storage network, that is, it aggravates the negative effect of Incast on the actual throughput. The answer is why the existing TCP congestion control mechanism can not play an advantage in the cluster storage network. The reason for the sharp decline in the actual network throughput is the continuous timeout caused by the transient explosive packet loss, and the probability of time out and the actual throughput can be estimated according to the quantitative evaluation model. All these studies give the next step to explore the In The optimization and solution of cast provide a theoretical basis.
Secondly, according to the quantitative analysis of the reasons for the formation of Incast, the implementation of the TCP minimum retransmission timeout timer is optimized. On the one hand, it prevents the oversized RTOmin from the TCP timeout effect; on the other hand, it avoids the TCP forged retransmission caused by the small RTOmin. The existing TCP congestion control mechanism is due to its universality, the minimum timeout retransmission In the protocol implementation, the precision setting of RTOmin is too low to meet the needs of the current high speed cluster storage network environment. In the kernel of the later version of Linux-2.6.18, the kernel is supported by the kernel for high precision clock. By optimizing the RTOmin in the implementation of the TCP protocol, the TCP timeout and forged retransmission of the cluster storage network are balanced. The impact of rate.
Thirdly, the load control measures are adopted in the application layer to limit the burst load rate of each memory node in synchronous reading, avoid the TCP timeout caused by the transient and explosive packet loss in the cluster storage network transmission, and thus solve the Incast problem. Under the support of the network interface load control module of the Linux kernel, the load control is implemented by the control script. The transmission of parameters limits the maximum load rate of synchronous concurrent transmission of multiple storage nodes, prevents network congestion and avoids the multiple TCP timeout caused by transient explosive packet loss events. The core idea of the load control strategy is that multiple storage nodes participating in synchronous transmission are equal to the network bottle neckline. The bandwidth resources of the road, that is, the maximum load rate of each synchronous transmission flow cannot exceed the average bandwidth that it should allocate in the cluster. Finally, a distributed continuous data protection system with typical Incast load characteristics is used to analyze the Incast behavior characteristics of the network load and the load characteristics of the local disk. Using RTOmin optimization and
The combination of load control improves the actual throughput of the client and reduces the network transmission time of the data request. The strategy of using the buffer chain for the local disk load of the check server is optimized, the IO frequency of the check server is reduced to the local disk, and the waiting time for the data checking calculation is reduced. The optimization of two aspects is made. To improve the network transmission efficiency and local IO performance of the cluster storage system, reduce the response time of the total user data service. This paper provides a guarantee for the high quality storage service of the large-scale enterprise data center through the analysis of the reasons for the formation of Incast and the study of the solutions.
【學(xué)位授予單位】:華中科技大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2012
【分類號(hào)】:TP333
【參考文獻(xiàn)】
相關(guān)博士學(xué)位論文 前2條
1 李旭;系統(tǒng)級(jí)數(shù)據(jù)保護(hù)技術(shù)研究[D];華中科技大學(xué);2008年
2 姚杰;分布式存儲(chǔ)系統(tǒng)文件級(jí)連續(xù)數(shù)據(jù)保護(hù)技術(shù)研究[D];華中科技大學(xué);2009年
本文編號(hào):2106494
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2106494.html
最近更新
教材專著