GPU加速MapReduce集群的設(shè)計與實現(xiàn)
發(fā)布時間:2018-09-04 15:22
【摘要】:更快速的處理海量數(shù)據(jù),是數(shù)據(jù)中心計算領(lǐng)域永恒的追求。隨著數(shù)據(jù)量的爆炸式的增長,以及應(yīng)用領(lǐng)域?qū)τ跀?shù)據(jù)處理時效性的要求越來越高,數(shù)據(jù)處理的壓力越來越大。人們不得不著手對現(xiàn)有的大規(guī)模數(shù)據(jù)處理的軟硬件架構(gòu)進(jìn)行改進(jìn)。MapReduce作為一種分布式并行計算模型,在企業(yè)大數(shù)據(jù)計算領(lǐng)域得到了廣泛的應(yīng)用。近年來,研究人員著手從各種角度挖掘MapReduce模型的性能潛力,其中,硬件加速的MapReduce是一種新穎的思路。在本文中,我們將介紹一種基于圖形處理器(GPU)加速的MapReduce實現(xiàn)平臺。GPU是一種高度并行的眾核(many-core)處理器,它可以同時發(fā)射上千線程,顯著提高計算速度。目前在高性能計算等領(lǐng)域,以GPU為代表的異構(gòu)協(xié)處理器已經(jīng)得到了廣泛認(rèn)可。以此為基礎(chǔ),我們嘗試將GPU的強(qiáng)大計算能力與MapReduce模型在數(shù)據(jù)密集型應(yīng)用方面的優(yōu)勢相結(jié)合,以實現(xiàn)一種基于GPU加速的高性能MapReduce集群。 以此為中心,本文所屬的課題展開了相應(yīng)的研究,具體的工作和成果如下: 1.設(shè)計并實現(xiàn)了一種基于GPU加速的MapReduce實現(xiàn)框架一—GAMR集群系統(tǒng); 2.提出了一種基于GPU的并行排序算法,并應(yīng)用于GAMR集群系統(tǒng)中,從而將作業(yè)運行階段的排序速度提高了3到8倍; 3.詳細(xì)分析了MapReduce作業(yè)的數(shù)據(jù)流,得到了一種形式化的MapReduce性能量化模型,從而使MapReduce作業(yè)的性能評估可以通過公式計算得出; 4.提出了一種基于共軛梯度優(yōu)化算法的自動化MapReduce集群性能優(yōu)化方法,減少了集群運維人員的工作量; 我們工作的核心思想是,將MapReduce模型的并行性從節(jié)點間粗粒度的多機(jī)(Multi-computer)并行,進(jìn)一步延伸到節(jié)點內(nèi)細(xì)粒度的眾核(Many-core)并行,通過異構(gòu)協(xié)處理器來提高M(jìn)apReudce運行環(huán)境的性能。實驗測試表明,與其他MapReduce實現(xiàn)環(huán)境相比,運行在GAMR集群上的MapReduce作業(yè)獲得了5倍左右的加速。
[Abstract]:Faster processing of massive data is the eternal pursuit in the field of data center computing. With the explosive growth of data volume and the increasing demand for the timeliness of data processing in the application field, the pressure of data processing is increasing. People have to improve the existing large-scale data processing software and hardware architecture. MapReduce as a distributed parallel computing model has been widely used in the field of enterprise big data computing. In recent years, researchers have begun to tap the performance potential of MapReduce models from various angles. Among them, hardware-accelerated MapReduce is a novel approach. In this paper, we will introduce a MapReduce implementation platform based on (GPU) acceleration. GPU is a highly parallel multikernel (many-core) processor, which can transmit thousands of threads at the same time, and significantly improve the computing speed. At present, heterogeneous coprocessors, represented by GPU, have been widely accepted in the field of high performance computing. On this basis, we try to combine the powerful computing power of GPU with the advantages of MapReduce model in data-intensive applications to achieve a high-performance MapReduce cluster based on GPU acceleration. Taking this as the center, the subject of this paper has carried out the corresponding research, the concrete work and the achievement are as follows: 1. Design and implementation of a MapReduce implementation framework based on GPU acceleration-GAMR cluster system; 2. A parallel sorting algorithm based on GPU is proposed and applied to the GAMR cluster system, which improves the sorting speed of the job running phase by 3 to 8 times. The data flow of MapReduce jobs is analyzed in detail and a formal MapReduce performance quantization model is obtained so that the performance evaluation of MapReduce jobs can be calculated by formula. 4. This paper presents an automatic MapReduce cluster performance optimization method based on conjugate gradient optimization algorithm, which reduces the workload of cluster operators. The parallelism of MapReduce model is extended from coarse-grained multi-machine (Multi-computer) parallel to fine-grained multi-kernel (Many-core) parallelism in nodes. The performance of MapReudce running environment is improved by heterogeneous coprocessor. The experimental results show that compared with other MapReduce implementation environments, the MapReduce jobs running on the GAMR cluster can be accelerated by about 5 times.
【學(xué)位授予單位】:云南大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:TP338.6
本文編號:2222570
[Abstract]:Faster processing of massive data is the eternal pursuit in the field of data center computing. With the explosive growth of data volume and the increasing demand for the timeliness of data processing in the application field, the pressure of data processing is increasing. People have to improve the existing large-scale data processing software and hardware architecture. MapReduce as a distributed parallel computing model has been widely used in the field of enterprise big data computing. In recent years, researchers have begun to tap the performance potential of MapReduce models from various angles. Among them, hardware-accelerated MapReduce is a novel approach. In this paper, we will introduce a MapReduce implementation platform based on (GPU) acceleration. GPU is a highly parallel multikernel (many-core) processor, which can transmit thousands of threads at the same time, and significantly improve the computing speed. At present, heterogeneous coprocessors, represented by GPU, have been widely accepted in the field of high performance computing. On this basis, we try to combine the powerful computing power of GPU with the advantages of MapReduce model in data-intensive applications to achieve a high-performance MapReduce cluster based on GPU acceleration. Taking this as the center, the subject of this paper has carried out the corresponding research, the concrete work and the achievement are as follows: 1. Design and implementation of a MapReduce implementation framework based on GPU acceleration-GAMR cluster system; 2. A parallel sorting algorithm based on GPU is proposed and applied to the GAMR cluster system, which improves the sorting speed of the job running phase by 3 to 8 times. The data flow of MapReduce jobs is analyzed in detail and a formal MapReduce performance quantization model is obtained so that the performance evaluation of MapReduce jobs can be calculated by formula. 4. This paper presents an automatic MapReduce cluster performance optimization method based on conjugate gradient optimization algorithm, which reduces the workload of cluster operators. The parallelism of MapReduce model is extended from coarse-grained multi-machine (Multi-computer) parallel to fine-grained multi-kernel (Many-core) parallelism in nodes. The performance of MapReudce running environment is improved by heterogeneous coprocessor. The experimental results show that compared with other MapReduce implementation environments, the MapReduce jobs running on the GAMR cluster can be accelerated by about 5 times.
【學(xué)位授予單位】:云南大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2013
【分類號】:TP338.6
【引證文獻(xiàn)】
相關(guān)博士學(xué)位論文 前1條
1 連淑君;共軛梯度算法的全局收斂性研究[D];大連理工大學(xué);2004年
,本文編號:2222570
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2222570.html
最近更新
教材專著