OpenACC 2.0性能可移植性分析
發(fā)布時(shí)間:2021-09-17 16:48
在高性能計(jì)算領(lǐng)域,應(yīng)用的性能可以伴隨著處理器的“摩爾定律”而提升,編程者不需要改寫代碼便能獲得處理器性能提升所帶來(lái)的“免費(fèi)午餐”,F(xiàn)如今這一趨勢(shì)已經(jīng)由于主頻的功耗瓶頸而無(wú)法延續(xù)下去,于是在高性能計(jì)算中興起了采用加速器的異構(gòu)并行計(jì)算實(shí)現(xiàn)計(jì)算性能的進(jìn)一步提升。Open ACC是一種基于指導(dǎo)語(yǔ)句的異構(gòu)并行編程標(biāo)準(zhǔn),能使編程者脫離異構(gòu)加速器的復(fù)雜底層架構(gòu)進(jìn)行編程,簡(jiǎn)化了異構(gòu)并行的難度。此外,Open ACC編譯器能通過(guò)這一高層的編程模型生成不同平臺(tái)上的并行代碼,從而使采用Open ACC編寫的應(yīng)用具備了優(yōu)良的跨平臺(tái)性。在Top500榜單上的不少超級(jí)計(jì)算機(jī)都已大量采用異構(gòu)并行計(jì)算作為其性能的來(lái)源,例如天河二號(hào)、Titan和TSUBAME 2.5。高性能計(jì)算的應(yīng)用開發(fā)者面臨著在采用不同加速器設(shè)備的超級(jí)計(jì)算機(jī)上需要編寫不同的代碼進(jìn)行并行計(jì)算,例如CUDA、Open CL和Open MP。而Open ACC則基于上述的優(yōu)點(diǎn),成為解決這一問(wèn)題的一種簡(jiǎn)便方法。本文主要針對(duì)Open ACC在NVIDIA Kepler架構(gòu)的GPU和Intel Knights Corner架構(gòu)的協(xié)處理器上的應(yīng)用性能可移植性進(jìn)行...
【文章來(lái)源】:上海交通大學(xué)上海市 211工程院校 985工程院校 教育部直屬院校
【文章頁(yè)數(shù)】:90 頁(yè)
【學(xué)位級(jí)別】:碩士
【文章目錄】:
摘要
ABSTRACT
List of Abbreviations
第一章 Introduction
1.1 High Performance Computing (HPC)
1.2 Problem de?nition
1.2.1 Portability
1.2.2 Productivity
1.2.3 Performance
1.3 Objectives
1.4 Summary of contributions
1.5 Related works
1.5.1 The investigated work on GPU
1.5.2 The investigated work on MIC and a Hybrid system
第二章 Programming in Heterogeneous System
2.1 The Architectures for Heterogeneous System
2.1.1 Graphics Processing Units (GPUs)
2.1.2 Intel Many Integrated Core (MIC)
2.2 Programming languages and frameworks
2.2.1 Parallel Thread Execution (PTX)
2.2.2 Open ACC
2.2.3 What is HMPP Codelet and how to get HMPP codelet, PTX,and MIC machine code files
2.2.4 Comparison of Open ACC and low-level programming model
2.2.5 Open ACC 2.0
第三章 Methodologies and Tuning techniques
3.1 Open CL optimization
3.2 Open ACC optimizations
3.2.1 Independent optimization
3.2.2 Directive organize optimization
3.2.3 ILP optimization
3.2.4 Grid Thread Mapping optimization
3.2.5 Compiler ?ags
3.2.6 Tiling optimization
第四章 Experimental setup
4.1 Testbed machine
4.1.1 π Supercomputer
4.2 Banchmarks
4.2.1 Rodinia benchmark suite and selected benchmarks
4.2.2 Hydro Benchmark
4.2.3 EPCC Benchmark
4.3 Pro?ling tools
第五章 Results and Discussion
5.1 Discussion of each optimizations
5.1.1 Directive organize optimization and Simple restructuring
5.1.2 Independent optimization
5.1.3 ILP optimizations
5.1.4 Grid thread mapping optimization
5.1.5 Tiling optimization
5.1.6 Compiler ?ags
5.2 Analyze each applications
5.2.1 Breadth First Search (BFS)
5.2.2 Gaussian Elimination (GE)
5.2.3 Back Propagation (BP)
5.2.4 LU Decomposition (LUD)
5.2.5 Hydro Benchmark
第六章 Conclusion
第七章 Future Work
參考文獻(xiàn)
Publication
本文編號(hào):3399127
【文章來(lái)源】:上海交通大學(xué)上海市 211工程院校 985工程院校 教育部直屬院校
【文章頁(yè)數(shù)】:90 頁(yè)
【學(xué)位級(jí)別】:碩士
【文章目錄】:
摘要
ABSTRACT
List of Abbreviations
第一章 Introduction
1.1 High Performance Computing (HPC)
1.2 Problem de?nition
1.2.1 Portability
1.2.2 Productivity
1.2.3 Performance
1.3 Objectives
1.4 Summary of contributions
1.5 Related works
1.5.1 The investigated work on GPU
1.5.2 The investigated work on MIC and a Hybrid system
第二章 Programming in Heterogeneous System
2.1 The Architectures for Heterogeneous System
2.1.1 Graphics Processing Units (GPUs)
2.1.2 Intel Many Integrated Core (MIC)
2.2 Programming languages and frameworks
2.2.1 Parallel Thread Execution (PTX)
2.2.2 Open ACC
2.2.3 What is HMPP Codelet and how to get HMPP codelet, PTX,and MIC machine code files
2.2.4 Comparison of Open ACC and low-level programming model
2.2.5 Open ACC 2.0
第三章 Methodologies and Tuning techniques
3.1 Open CL optimization
3.2 Open ACC optimizations
3.2.1 Independent optimization
3.2.2 Directive organize optimization
3.2.3 ILP optimization
3.2.4 Grid Thread Mapping optimization
3.2.5 Compiler ?ags
3.2.6 Tiling optimization
第四章 Experimental setup
4.1 Testbed machine
4.1.1 π Supercomputer
4.2 Banchmarks
4.2.1 Rodinia benchmark suite and selected benchmarks
4.2.2 Hydro Benchmark
4.2.3 EPCC Benchmark
4.3 Pro?ling tools
第五章 Results and Discussion
5.1 Discussion of each optimizations
5.1.1 Directive organize optimization and Simple restructuring
5.1.2 Independent optimization
5.1.3 ILP optimizations
5.1.4 Grid thread mapping optimization
5.1.5 Tiling optimization
5.1.6 Compiler ?ags
5.2 Analyze each applications
5.2.1 Breadth First Search (BFS)
5.2.2 Gaussian Elimination (GE)
5.2.3 Back Propagation (BP)
5.2.4 LU Decomposition (LUD)
5.2.5 Hydro Benchmark
第六章 Conclusion
第七章 Future Work
參考文獻(xiàn)
Publication
本文編號(hào):3399127
本文鏈接:http://sikaile.net/shoufeilunwen/xixikjs/3399127.html
最近更新
教材專著