GPU加速M(fèi)apReduce集群的設(shè)計(jì)與實(shí)現(xiàn)
發(fā)布時(shí)間:2018-09-04 15:22
【摘要】:更快速的處理海量數(shù)據(jù),是數(shù)據(jù)中心計(jì)算領(lǐng)域永恒的追求。隨著數(shù)據(jù)量的爆炸式的增長(zhǎng),以及應(yīng)用領(lǐng)域?qū)τ跀?shù)據(jù)處理時(shí)效性的要求越來越高,數(shù)據(jù)處理的壓力越來越大。人們不得不著手對(duì)現(xiàn)有的大規(guī)模數(shù)據(jù)處理的軟硬件架構(gòu)進(jìn)行改進(jìn)。MapReduce作為一種分布式并行計(jì)算模型,在企業(yè)大數(shù)據(jù)計(jì)算領(lǐng)域得到了廣泛的應(yīng)用。近年來,研究人員著手從各種角度挖掘MapReduce模型的性能潛力,其中,硬件加速的MapReduce是一種新穎的思路。在本文中,我們將介紹一種基于圖形處理器(GPU)加速的MapReduce實(shí)現(xiàn)平臺(tái)。GPU是一種高度并行的眾核(many-core)處理器,它可以同時(shí)發(fā)射上千線程,顯著提高計(jì)算速度。目前在高性能計(jì)算等領(lǐng)域,以GPU為代表的異構(gòu)協(xié)處理器已經(jīng)得到了廣泛認(rèn)可。以此為基礎(chǔ),我們嘗試將GPU的強(qiáng)大計(jì)算能力與MapReduce模型在數(shù)據(jù)密集型應(yīng)用方面的優(yōu)勢(shì)相結(jié)合,以實(shí)現(xiàn)一種基于GPU加速的高性能MapReduce集群。 以此為中心,本文所屬的課題展開了相應(yīng)的研究,具體的工作和成果如下: 1.設(shè)計(jì)并實(shí)現(xiàn)了一種基于GPU加速的MapReduce實(shí)現(xiàn)框架一—GAMR集群系統(tǒng); 2.提出了一種基于GPU的并行排序算法,并應(yīng)用于GAMR集群系統(tǒng)中,從而將作業(yè)運(yùn)行階段的排序速度提高了3到8倍; 3.詳細(xì)分析了MapReduce作業(yè)的數(shù)據(jù)流,得到了一種形式化的MapReduce性能量化模型,從而使MapReduce作業(yè)的性能評(píng)估可以通過公式計(jì)算得出; 4.提出了一種基于共軛梯度優(yōu)化算法的自動(dòng)化MapReduce集群性能優(yōu)化方法,減少了集群運(yùn)維人員的工作量; 我們工作的核心思想是,將MapReduce模型的并行性從節(jié)點(diǎn)間粗粒度的多機(jī)(Multi-computer)并行,進(jìn)一步延伸到節(jié)點(diǎn)內(nèi)細(xì)粒度的眾核(Many-core)并行,通過異構(gòu)協(xié)處理器來提高M(jìn)apReudce運(yùn)行環(huán)境的性能。實(shí)驗(yàn)測(cè)試表明,與其他MapReduce實(shí)現(xiàn)環(huán)境相比,運(yùn)行在GAMR集群上的MapReduce作業(yè)獲得了5倍左右的加速。
[Abstract]:Faster processing of massive data is the eternal pursuit in the field of data center computing. With the explosive growth of data volume and the increasing demand for the timeliness of data processing in the application field, the pressure of data processing is increasing. People have to improve the existing large-scale data processing software and hardware architecture. MapReduce as a distributed parallel computing model has been widely used in the field of enterprise big data computing. In recent years, researchers have begun to tap the performance potential of MapReduce models from various angles. Among them, hardware-accelerated MapReduce is a novel approach. In this paper, we will introduce a MapReduce implementation platform based on (GPU) acceleration. GPU is a highly parallel multikernel (many-core) processor, which can transmit thousands of threads at the same time, and significantly improve the computing speed. At present, heterogeneous coprocessors, represented by GPU, have been widely accepted in the field of high performance computing. On this basis, we try to combine the powerful computing power of GPU with the advantages of MapReduce model in data-intensive applications to achieve a high-performance MapReduce cluster based on GPU acceleration. Taking this as the center, the subject of this paper has carried out the corresponding research, the concrete work and the achievement are as follows: 1. Design and implementation of a MapReduce implementation framework based on GPU acceleration-GAMR cluster system; 2. A parallel sorting algorithm based on GPU is proposed and applied to the GAMR cluster system, which improves the sorting speed of the job running phase by 3 to 8 times. The data flow of MapReduce jobs is analyzed in detail and a formal MapReduce performance quantization model is obtained so that the performance evaluation of MapReduce jobs can be calculated by formula. 4. This paper presents an automatic MapReduce cluster performance optimization method based on conjugate gradient optimization algorithm, which reduces the workload of cluster operators. The parallelism of MapReduce model is extended from coarse-grained multi-machine (Multi-computer) parallel to fine-grained multi-kernel (Many-core) parallelism in nodes. The performance of MapReudce running environment is improved by heterogeneous coprocessor. The experimental results show that compared with other MapReduce implementation environments, the MapReduce jobs running on the GAMR cluster can be accelerated by about 5 times.
【學(xué)位授予單位】:云南大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類號(hào)】:TP338.6
本文編號(hào):2222570
[Abstract]:Faster processing of massive data is the eternal pursuit in the field of data center computing. With the explosive growth of data volume and the increasing demand for the timeliness of data processing in the application field, the pressure of data processing is increasing. People have to improve the existing large-scale data processing software and hardware architecture. MapReduce as a distributed parallel computing model has been widely used in the field of enterprise big data computing. In recent years, researchers have begun to tap the performance potential of MapReduce models from various angles. Among them, hardware-accelerated MapReduce is a novel approach. In this paper, we will introduce a MapReduce implementation platform based on (GPU) acceleration. GPU is a highly parallel multikernel (many-core) processor, which can transmit thousands of threads at the same time, and significantly improve the computing speed. At present, heterogeneous coprocessors, represented by GPU, have been widely accepted in the field of high performance computing. On this basis, we try to combine the powerful computing power of GPU with the advantages of MapReduce model in data-intensive applications to achieve a high-performance MapReduce cluster based on GPU acceleration. Taking this as the center, the subject of this paper has carried out the corresponding research, the concrete work and the achievement are as follows: 1. Design and implementation of a MapReduce implementation framework based on GPU acceleration-GAMR cluster system; 2. A parallel sorting algorithm based on GPU is proposed and applied to the GAMR cluster system, which improves the sorting speed of the job running phase by 3 to 8 times. The data flow of MapReduce jobs is analyzed in detail and a formal MapReduce performance quantization model is obtained so that the performance evaluation of MapReduce jobs can be calculated by formula. 4. This paper presents an automatic MapReduce cluster performance optimization method based on conjugate gradient optimization algorithm, which reduces the workload of cluster operators. The parallelism of MapReduce model is extended from coarse-grained multi-machine (Multi-computer) parallel to fine-grained multi-kernel (Many-core) parallelism in nodes. The performance of MapReudce running environment is improved by heterogeneous coprocessor. The experimental results show that compared with other MapReduce implementation environments, the MapReduce jobs running on the GAMR cluster can be accelerated by about 5 times.
【學(xué)位授予單位】:云南大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類號(hào)】:TP338.6
【引證文獻(xiàn)】
相關(guān)博士學(xué)位論文 前1條
1 連淑君;共軛梯度算法的全局收斂性研究[D];大連理工大學(xué);2004年
,本文編號(hào):2222570
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2222570.html
最近更新
教材專著