基于GPU的TOUGHREACT并行化實(shí)現(xiàn)
發(fā)布時(shí)間:2018-08-24 19:20
【摘要】:近年來(lái),高性能并行計(jì)算技術(shù)發(fā)展迅速。利用新的多核、眾核以及GPU計(jì)算平臺(tái)高效實(shí)現(xiàn)復(fù)雜地質(zhì)條件下物理化學(xué)狀態(tài)數(shù)值模型的模擬,已經(jīng)成為地質(zhì)工作者越來(lái)越關(guān)心的科學(xué)課題。隨著GPU通用計(jì)算的出現(xiàn)以及飛速發(fā)展,越來(lái)越多的研究人員利用GPU技術(shù)來(lái)加速地下多相流數(shù)值模擬軟件的計(jì)算過程,以滿足大尺度、高精度的應(yīng)用需求。由勞倫斯伯克利實(shí)驗(yàn)室開發(fā)的TOUGHREACT是當(dāng)前應(yīng)用最廣泛的解決地下多相流體運(yùn)動(dòng)與地球化學(xué)反應(yīng)運(yùn)移耦合過程和機(jī)理的模擬程序。當(dāng)前,在對(duì)要求較大尺度、較高精度的復(fù)雜地質(zhì)環(huán)境問題(如二氧化碳地質(zhì)儲(chǔ)存)進(jìn)行數(shù)值模擬時(shí),TOUGHREACT執(zhí)行效率不高。因此通過GPU并行計(jì)算技術(shù)加速TOUGHREACT的數(shù)值模擬過程有非常重要的工程意義和研究?jī)r(jià)值。本文基于此目的在CPU-GPU異構(gòu)計(jì)算平臺(tái)上對(duì)TOUGHREACT軟件進(jìn)行了并行化實(shí)現(xiàn)。 首先,通過了解相關(guān)專業(yè)知識(shí),對(duì)軟件的基本模擬過程進(jìn)行簡(jiǎn)要理解。參考已有的研究工作,對(duì)軟件的模塊化結(jié)構(gòu)進(jìn)行了詳細(xì)分析。對(duì)比多相流模塊與地球化學(xué)反應(yīng)運(yùn)移模塊在求解過程中的差異,綜合考慮線性方程組的規(guī)模和每個(gè)時(shí)間步內(nèi)迭代求解過程的并發(fā)性,確定多相流動(dòng)數(shù)值模擬部分更適合在GPU平臺(tái)上并行實(shí)現(xiàn)。 在對(duì)自然科學(xué)和社會(huì)科學(xué)中許多實(shí)際問題進(jìn)行數(shù)值求解時(shí),經(jīng)常使用偏微分方程作為數(shù)值模型來(lái)表示質(zhì)量與能量守恒狀態(tài),而在對(duì)偏微分方程進(jìn)行離散求解時(shí),稀疏線性方程組的求解是主要的計(jì)算步驟之一。尤其是在對(duì)某些場(chǎng)地級(jí)大尺度問題進(jìn)行模擬時(shí),稀疏線性方程組的求解時(shí)間會(huì)達(dá)到80%以上。因此,本文對(duì)TOUREACT中各部分模塊執(zhí)行時(shí)間進(jìn)行了對(duì)比,選擇以其中線性方程組求解過程為重點(diǎn)開展并行化工作。 由于求解多相流問題時(shí)遇到的系數(shù)矩陣具有非對(duì)稱非正定的特征,因此本文使用krylov子空間法中的幾種雙共軛梯度法求解方程組。同時(shí),為了不以犧牲求解效率為代價(jià),決定不對(duì)預(yù)處理部分做GPU移植,而主要針對(duì)求解中最耗時(shí)的兩個(gè)部分:稀疏矩陣向量乘(SPMV)和向量?jī)?nèi)積操作進(jìn)行CUDA實(shí)現(xiàn)。確定了各個(gè)內(nèi)核函數(shù)映射關(guān)系以后,基于CUDA的并行程序開發(fā)難度不大,但是一些必要的優(yōu)化手段可以顯著提高并行程序的性能。本文作了如下工作:選擇合理的稀疏矩陣存儲(chǔ)格式,減少內(nèi)存占用以及主機(jī)與設(shè)備的數(shù)據(jù)傳輸開銷;優(yōu)化存儲(chǔ)器訪問,使用共享內(nèi)存、頁(yè)鎖定存儲(chǔ)器以及合并順序執(zhí)行的內(nèi)核函數(shù)來(lái)減少全局內(nèi)存訪問;優(yōu)化指令流,包括避免不必要的同步操作以及循環(huán)展開;實(shí)現(xiàn)多版本內(nèi)核,建立線程規(guī)模判定樹,根據(jù)不同的問題規(guī)模進(jìn)行合理的線程組織,充分利用GPU上的處理器資源,以達(dá)到負(fù)載均衡的目的。 最后,將實(shí)現(xiàn)的并行預(yù)處理共軛梯度求解器整合到TOUGHREACT程序中。在CPU-GPU構(gòu)成的計(jì)算平臺(tái)上,對(duì)不同規(guī)模的實(shí)際問題進(jìn)行數(shù)值模擬,對(duì)本文實(shí)現(xiàn)的并行BICG和并行BICGSTB算法進(jìn)行性能測(cè)試。實(shí)驗(yàn)表明,本文實(shí)現(xiàn)的線性方程組并行求解器相對(duì)于CPU串行程序有最多3.4倍的加速比,對(duì)多相流動(dòng)數(shù)值模擬的整體求解過程有最多2.8倍的加速比。這一結(jié)果印證了本文使用的并行化策略的正確性,為進(jìn)一步的對(duì)地球化學(xué)反應(yīng)運(yùn)移模塊的GPU移植工作打下了很好的基礎(chǔ),積累了豐富的經(jīng)驗(yàn)。
[Abstract]:In recent years, high-performance parallel computing technology has developed rapidly. Using new multi-core, multi-core and GPU computing platform to efficiently simulate the physical and chemical state numerical model under complex geological conditions has become a scientific topic of increasing concern to geologists. GPU technology is used to speed up the calculation process of underground multiphase flow numerical simulation software to meet the needs of large-scale and high-precision applications.TOUGHREACT developed by Lawrence Berkeley Laboratory is the most widely used simulation program to solve the coupling process and mechanism of underground multiphase flow and geochemical reaction and migration. Therefore, it is of great engineering significance and research value to accelerate the numerical simulation process of TOUGHREACT by GPU parallel computing technology. This paper is based on this purpose in CPU-GPU heterogeneous. TOUGHREACT software is parallelized on the computing platform.
Firstly, the basic simulation process of the software is briefly understood by understanding the relevant professional knowledge. Referring to the existing research work, the modular structure of the software is analyzed in detail. The concurrency of the iterative process in the step determines that the numerical simulation part of multiphase flow is more suitable for parallel implementation on the GPU platform.
Partial differential equations (PDEs) are often used as numerical models to represent the conservation of mass and energy in numerical solutions of many practical problems in natural and social sciences. In the discrete solution of PDEs, the solution of sparse linear equations is one of the main computational steps, especially for large sites. When the scale problem is simulated, the solution time of the sparse linear equations will be more than 80%. Therefore, this paper compares the execution time of each module in TOUREACT, and chooses the solution process of the linear equations as the focus of parallel work.
Because the coefficient matrices encountered in solving multiphase flow problems are asymmetric and non-positive definite, several double conjugate gradient methods in Krylov subspace method are used to solve the equations in this paper. Divided into: Sparse Matrix Vector Multiplication (SPMV) and Vector Inner Product (VIP) operations are implemented in CUDA. After determining the mapping relations of each kernel function, it is not difficult to develop parallel programs based on CUDA, but some necessary optimization methods can significantly improve the performance of parallel programs. Optimizing memory access, using shared memory, page-locked memory, and merging sequential kernel functions to reduce global memory access; optimizing instruction flow, including avoiding unnecessary synchronization and loop unwrapping; implementing a multi-version kernel to establish lines Program size decision tree is used to organize threads reasonably according to different problem sizes and make full use of processor resources on GPU to achieve load balancing.
Finally, the parallel preconditioned conjugate gradient solver is integrated into the TOUGHREACT program. On the platform of CPU-GPU, numerical simulations are carried out for practical problems of different scales. The performance of the parallel BICG and parallel BICGSTB algorithms implemented in this paper are tested. Experiments show that the parallel solver of linear equations realized in this paper is phase-wise. There is a maximum acceleration ratio of 3.4 times for the CPU serial program and 2.8 times for the whole solution process of multiphase flow numerical simulation. This result confirms the correctness of the parallelization strategy used in this paper, and lays a good foundation for further GPU transplantation of the geochemical reaction and migration module. Experience.
【學(xué)位授予單位】:吉林大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類號(hào)】:TP338.6
本文編號(hào):2201782
[Abstract]:In recent years, high-performance parallel computing technology has developed rapidly. Using new multi-core, multi-core and GPU computing platform to efficiently simulate the physical and chemical state numerical model under complex geological conditions has become a scientific topic of increasing concern to geologists. GPU technology is used to speed up the calculation process of underground multiphase flow numerical simulation software to meet the needs of large-scale and high-precision applications.TOUGHREACT developed by Lawrence Berkeley Laboratory is the most widely used simulation program to solve the coupling process and mechanism of underground multiphase flow and geochemical reaction and migration. Therefore, it is of great engineering significance and research value to accelerate the numerical simulation process of TOUGHREACT by GPU parallel computing technology. This paper is based on this purpose in CPU-GPU heterogeneous. TOUGHREACT software is parallelized on the computing platform.
Firstly, the basic simulation process of the software is briefly understood by understanding the relevant professional knowledge. Referring to the existing research work, the modular structure of the software is analyzed in detail. The concurrency of the iterative process in the step determines that the numerical simulation part of multiphase flow is more suitable for parallel implementation on the GPU platform.
Partial differential equations (PDEs) are often used as numerical models to represent the conservation of mass and energy in numerical solutions of many practical problems in natural and social sciences. In the discrete solution of PDEs, the solution of sparse linear equations is one of the main computational steps, especially for large sites. When the scale problem is simulated, the solution time of the sparse linear equations will be more than 80%. Therefore, this paper compares the execution time of each module in TOUREACT, and chooses the solution process of the linear equations as the focus of parallel work.
Because the coefficient matrices encountered in solving multiphase flow problems are asymmetric and non-positive definite, several double conjugate gradient methods in Krylov subspace method are used to solve the equations in this paper. Divided into: Sparse Matrix Vector Multiplication (SPMV) and Vector Inner Product (VIP) operations are implemented in CUDA. After determining the mapping relations of each kernel function, it is not difficult to develop parallel programs based on CUDA, but some necessary optimization methods can significantly improve the performance of parallel programs. Optimizing memory access, using shared memory, page-locked memory, and merging sequential kernel functions to reduce global memory access; optimizing instruction flow, including avoiding unnecessary synchronization and loop unwrapping; implementing a multi-version kernel to establish lines Program size decision tree is used to organize threads reasonably according to different problem sizes and make full use of processor resources on GPU to achieve load balancing.
Finally, the parallel preconditioned conjugate gradient solver is integrated into the TOUGHREACT program. On the platform of CPU-GPU, numerical simulations are carried out for practical problems of different scales. The performance of the parallel BICG and parallel BICGSTB algorithms implemented in this paper are tested. Experiments show that the parallel solver of linear equations realized in this paper is phase-wise. There is a maximum acceleration ratio of 3.4 times for the CPU serial program and 2.8 times for the whole solution process of multiphase flow numerical simulation. This result confirms the correctness of the parallelization strategy used in this paper, and lays a good foundation for further GPU transplantation of the geochemical reaction and migration module. Experience.
【學(xué)位授予單位】:吉林大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類號(hào)】:TP338.6
【參考文獻(xiàn)】
相關(guān)期刊論文 前1條
1 施小清;張可霓;吳吉春;;TOUGH2軟件的發(fā)展及應(yīng)用[J];工程勘察;2009年10期
,本文編號(hào):2201782
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2201782.html
最近更新
教材專著