重復(fù)數(shù)據(jù)刪除技術(shù)中的并行性能優(yōu)化算法研究
發(fā)布時(shí)間:2019-05-28 15:11
【摘要】:互聯(lián)網(wǎng)技術(shù)的不斷發(fā)展以及新興網(wǎng)絡(luò)應(yīng)用的出現(xiàn),各種重要數(shù)據(jù)正在以PB級(jí)(千萬億字節(jié))的規(guī)模逐年增長。重復(fù)數(shù)據(jù)刪除技術(shù)通過對(duì)存儲(chǔ)數(shù)據(jù)流中冗余數(shù)據(jù)的定位與消除實(shí)現(xiàn)了存儲(chǔ)資源的高效利用,使其在網(wǎng)絡(luò)存儲(chǔ)和備份系統(tǒng)中發(fā)揮著在越來越重要的作用。作為一種計(jì)算密集型和I/O密集型的應(yīng)用,當(dāng)存儲(chǔ)數(shù)據(jù)量不斷增長時(shí),重復(fù)數(shù)據(jù)刪除系統(tǒng)的性能主要受到數(shù)據(jù)塊哈希計(jì)算和磁盤索引查重兩方面的影響。近年來,隨著多核(multi-core)和眾核(many-core)處理器的普及,如何在并行環(huán)境中實(shí)現(xiàn)高效的數(shù)據(jù)塊哈希計(jì)算和磁盤索引查重,已經(jīng)成為重復(fù)數(shù)據(jù)刪除性能優(yōu)化研究中的熱點(diǎn)問題。 目前,重復(fù)數(shù)據(jù)刪除技術(shù)并行加速方法主要圍繞著基于GPU的并行計(jì)算和基于多線程的并行磁盤索引查重這兩種思想展開。然而,隨著并行規(guī)模的增加,現(xiàn)有算法存在著一定的性能提升瓶頸,嚴(yán)重影響了當(dāng)前算法的并行可擴(kuò)展性。研究通過對(duì)GPU并行計(jì)算和并行查重算法性能評(píng)價(jià)模型的建立與分析,導(dǎo)出了并行規(guī)模擴(kuò)大下影響系統(tǒng)性能提升的主要因素。針對(duì)兩種方法中存在的性能瓶頸,提出了相關(guān)的優(yōu)化算法并通過實(shí)際數(shù)據(jù)的測(cè)試證明了算法先進(jìn)性。 在并行系統(tǒng)的性能優(yōu)化研究中,需要對(duì)系統(tǒng)中主要處理環(huán)節(jié)對(duì)整體性能的影響進(jìn)行詳細(xì)的量化分析從而找出系統(tǒng)的性能瓶頸。針對(duì)GPU并行計(jì)算和并行索引表查重兩種主要的并行算法,結(jié)合其具體處理流程以及其中存在的并發(fā)、資源共亨、競(jìng)爭(zhēng)等因素,研究提出了兩種基于隨機(jī)Petri網(wǎng)的性能分析模型。通過對(duì)模型中各處理環(huán)節(jié)的處理速率和系統(tǒng)利用率的推導(dǎo)計(jì)算,得出了影響系統(tǒng)性能主要因素,為并行算法的優(yōu)化研究提供了理論基礎(chǔ)。同時(shí),在后續(xù)的優(yōu)化方法的研究中,通過對(duì)實(shí)際數(shù)據(jù)的測(cè)試結(jié)果分析,分別證明了性能模型推導(dǎo)結(jié)果的正確性。 在GPU加速的并行分塊和指紋計(jì)算的處理方法中,主存儲(chǔ)器和GPU全局存儲(chǔ)器之間的數(shù)據(jù)傳輸延時(shí)已經(jīng)成為影響整體計(jì)算性能的瓶頸。在傳統(tǒng)GPU加速算法中,不同計(jì)算環(huán)節(jié)中對(duì)相同數(shù)據(jù)的重復(fù)傳輸會(huì)造成多余的數(shù)據(jù)傳輸開銷。為此,提出了一種優(yōu)化的GPU加速算法,該方法通過對(duì)傳統(tǒng)方法中數(shù)據(jù)處理步驟的優(yōu)化處理,能夠有效地減少相同數(shù)據(jù)在主存和GPU之間的重復(fù)傳輸,從而有效地減輕數(shù)據(jù)傳輸時(shí)延對(duì)系統(tǒng)整體性能的影響。 由于數(shù)據(jù)的唯一性的需求,為了防止并行查重線程間的數(shù)據(jù)沖突,在并行磁盤索引查重方法中需要設(shè)計(jì)一種線程同步機(jī)制。隨著并行規(guī)模的擴(kuò)大,現(xiàn)有并行查重方法中采用的鎖機(jī)制帶來了巨大的一致性開銷,嚴(yán)重影響了并行查重算法的整體性能。為減少并行查詢的一致性開銷,同時(shí)減少磁盤索引的查詢延遲,結(jié)合分布式索引表的特點(diǎn),提出了一種基于數(shù)據(jù)指紋后綴的并行索引查詢優(yōu)化方法。該方法根據(jù)數(shù)據(jù)指紋的哈希后綴將不同的數(shù)據(jù)指紋的查詢?nèi)蝿?wù)加載于不同的查詢線程以減少并行線程間的一致性開銷。實(shí)驗(yàn)表明,該方法能夠有效地減少系統(tǒng)I/O延時(shí),相對(duì)于傳統(tǒng)方法而言,能夠有效地提高系統(tǒng)整體性能。
[Abstract]:With the development of Internet technology and the emergence of new network applications, all kinds of important data are increasing year by year with the scale of petabytes (millions of bytes). Deduplication technology has realized the efficient utilization of the storage resource by locating and eliminating the redundant data in the storage data stream, making it play an increasingly important role in the network storage and backup system. As a compute-intensive and I/ O-intensive application, the performance of the deduplication system is mainly affected by the data block hash calculation and the disk index check when the amount of storage data is increasing. In recent years, with the popularization of multi-core and many-core processors, how to realize the efficient data block hash calculation and the disk index check in the parallel environment has become a hot issue in the research of repeated data deletion performance optimization. At present, the parallel acceleration method of the repeated data deletion technology is mainly about the parallel computation based on the GPU and the parallel disk index based on the multi-thread. On the other hand, with the increase of the parallel scale, the existing algorithm has a certain performance improvement bottleneck, which seriously affects the parallel scalability of the current algorithm. In this paper, through the establishment and analysis of the performance evaluation model of the GPU parallel computing and parallel checking algorithm, the main cause of the performance improvement of the system under the scale of the parallel scale is derived. In order to solve the performance bottleneck in two methods, the related optimization algorithm is put forward and the algorithm is advanced through the test of the actual data. In the performance optimization of the parallel system, the influence of the main processing links on the overall performance of the system needs to be quantified and analyzed in detail, so as to find out the system's performance This paper presents two main parallel algorithms for GPU parallel computation and parallel index table, combining with the specific processing flow and the factors such as concurrency, resource co-existence and competition, and puts forward two performance points based on the stochastic Petri net. Based on the calculation of the processing rate and the system utilization rate of each processing link in the model, the main factors that affect the performance of the system are obtained, and the optimization research of the parallel algorithm is provided. On the basis of the research of the following optimization methods, the result of the performance model is proved through the analysis of the test results of the actual data. Correctness. Data transmission delay between main memory and GPU global memory has become the whole calculation in the processing method of GPU acceleration parallel block and fingerprint calculation The bottleneck of performance. In the traditional GPU acceleration algorithm, the repeated transmission of the same data in different computing links can cause redundancy according to the transmission overhead, an optimized GPU acceleration algorithm is proposed, which can effectively reduce the repeated transmission of the same data between the main memory and the GPU by optimizing the data processing steps in the conventional method, thereby effectively reducing the data transmission time delay and the whole system as a whole, The effect of the performance is affected by the uniqueness of the data. In order to prevent the data collision among the parallel check-in threads, it is necessary to design one in the parallel disk index checking method. With the expansion of the parallel scale, the lock mechanism adopted in the existing parallel-checking method brings huge consistency overhead, which seriously affects the parallel checking. In order to reduce the consistency of the parallel query, the query delay of the disk index is reduced, and the parallel cable based on the data fingerprint suffix is proposed in combination with the characteristics of the distributed index table. The method comprises the following steps of: loading the query tasks of different data fingerprints on different query threads according to the hash suffix of the data fingerprint so as to reduce the parallel thread; The experiment shows that the method can effectively reduce the I/ O delay of the system.
【學(xué)位授予單位】:華中科技大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2013
【分類號(hào)】:TP333
本文編號(hào):2487159
[Abstract]:With the development of Internet technology and the emergence of new network applications, all kinds of important data are increasing year by year with the scale of petabytes (millions of bytes). Deduplication technology has realized the efficient utilization of the storage resource by locating and eliminating the redundant data in the storage data stream, making it play an increasingly important role in the network storage and backup system. As a compute-intensive and I/ O-intensive application, the performance of the deduplication system is mainly affected by the data block hash calculation and the disk index check when the amount of storage data is increasing. In recent years, with the popularization of multi-core and many-core processors, how to realize the efficient data block hash calculation and the disk index check in the parallel environment has become a hot issue in the research of repeated data deletion performance optimization. At present, the parallel acceleration method of the repeated data deletion technology is mainly about the parallel computation based on the GPU and the parallel disk index based on the multi-thread. On the other hand, with the increase of the parallel scale, the existing algorithm has a certain performance improvement bottleneck, which seriously affects the parallel scalability of the current algorithm. In this paper, through the establishment and analysis of the performance evaluation model of the GPU parallel computing and parallel checking algorithm, the main cause of the performance improvement of the system under the scale of the parallel scale is derived. In order to solve the performance bottleneck in two methods, the related optimization algorithm is put forward and the algorithm is advanced through the test of the actual data. In the performance optimization of the parallel system, the influence of the main processing links on the overall performance of the system needs to be quantified and analyzed in detail, so as to find out the system's performance This paper presents two main parallel algorithms for GPU parallel computation and parallel index table, combining with the specific processing flow and the factors such as concurrency, resource co-existence and competition, and puts forward two performance points based on the stochastic Petri net. Based on the calculation of the processing rate and the system utilization rate of each processing link in the model, the main factors that affect the performance of the system are obtained, and the optimization research of the parallel algorithm is provided. On the basis of the research of the following optimization methods, the result of the performance model is proved through the analysis of the test results of the actual data. Correctness. Data transmission delay between main memory and GPU global memory has become the whole calculation in the processing method of GPU acceleration parallel block and fingerprint calculation The bottleneck of performance. In the traditional GPU acceleration algorithm, the repeated transmission of the same data in different computing links can cause redundancy according to the transmission overhead, an optimized GPU acceleration algorithm is proposed, which can effectively reduce the repeated transmission of the same data between the main memory and the GPU by optimizing the data processing steps in the conventional method, thereby effectively reducing the data transmission time delay and the whole system as a whole, The effect of the performance is affected by the uniqueness of the data. In order to prevent the data collision among the parallel check-in threads, it is necessary to design one in the parallel disk index checking method. With the expansion of the parallel scale, the lock mechanism adopted in the existing parallel-checking method brings huge consistency overhead, which seriously affects the parallel checking. In order to reduce the consistency of the parallel query, the query delay of the disk index is reduced, and the parallel cable based on the data fingerprint suffix is proposed in combination with the characteristics of the distributed index table. The method comprises the following steps of: loading the query tasks of different data fingerprints on different query threads according to the hash suffix of the data fingerprint so as to reduce the parallel thread; The experiment shows that the method can effectively reduce the I/ O delay of the system.
【學(xué)位授予單位】:華中科技大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2013
【分類號(hào)】:TP333
【參考文獻(xiàn)】
相關(guān)期刊論文 前1條
1 林闖,李雅娟,王忠民;性能評(píng)價(jià)形式化方法的現(xiàn)狀和發(fā)展[J];電子學(xué)報(bào);2002年S1期
相關(guān)博士學(xué)位論文 前1條
1 楊鵬;基于廣義隨機(jī)Petri網(wǎng)理論的SIP的研究[D];蘭州理工大學(xué);2009年
,本文編號(hào):2487159
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2487159.html
最近更新
教材專著