異構(gòu)地震資料處理集群的偏移效率研究
發(fā)布時(shí)間:2019-02-24 17:21
【摘要】:基于波動(dòng)方程的疊前深度偏移能夠?qū)?fù)雜地質(zhì)區(qū)塊實(shí)現(xiàn)高質(zhì)量的偏移成像,是尋找油氣的重要手段。但疊前深度偏移數(shù)據(jù)量極大,對(duì)計(jì)算需求極高,限制了其實(shí)際應(yīng)用。CPU-GPU異構(gòu)集群在性能、功耗、造價(jià)、散熱等方面有著巨大優(yōu)勢(shì),為疊前深度偏移的普及帶來(lái)了契機(jī)。但是,CPU-GPU異構(gòu)在系統(tǒng)組成、體系結(jié)構(gòu)、編程模型等方面與一致、串行、簡(jiǎn)潔的傳統(tǒng)CPU模型有很大不同,高效利用異構(gòu)計(jì)算資源面臨著許多問(wèn)題與挑戰(zhàn)。 本文首先對(duì)非一致訪問(wèn)和總線競(jìng)爭(zhēng)所帶的影響進(jìn)行了定性分析和定量測(cè)試,結(jié)果表明不合理的數(shù)據(jù)通路和總線競(jìng)爭(zhēng)與飽和會(huì)對(duì)通信性能帶來(lái)顯著影響,可能成為I/O訪問(wèn)頻繁的偏移處理的瓶頸。隨后討論了幾種避免瓶頸的策略,并結(jié)合偏移處理中常用的數(shù)值計(jì)算方法進(jìn)行了實(shí)驗(yàn),,優(yōu)化后的應(yīng)用在性能和穩(wěn)定性方面得到了改善。 為充分挖掘GPU的計(jì)算潛力,本文對(duì)CUDA模型進(jìn)行了剖析,并認(rèn)為多線程SIMD處理器的視角更有助于把握GPU本質(zhì)與開發(fā)高效的應(yīng)用。針對(duì)Fermi架構(gòu),通過(guò)微基準(zhǔn)測(cè)試探測(cè)了部分微體系結(jié)構(gòu)特性,為深度性能優(yōu)化提供支撐?紤]到快速傅里葉變換在偏移處理中的廣泛應(yīng)用,本文隨后基于Fermi微體系結(jié)構(gòu),對(duì)已經(jīng)優(yōu)化的GPU快速傅里葉變換例程進(jìn)行深入分析,通過(guò)數(shù)據(jù)預(yù)取和指令調(diào)整,提高了指令級(jí)并行,雖然線程規(guī)模有所下降,但性能仍改進(jìn)了12%。 針對(duì)SIMD分支分歧會(huì)導(dǎo)致性能顯著下降的問(wèn)題,本文提出了“聚合”與“提取”這兩種軟件級(jí)的優(yōu)化策略。測(cè)試結(jié)果表明,對(duì)合適的分支,“聚合”能夠提高每步SIMD執(zhí)行有效結(jié)果的比重,“提取”能夠降低SIMD分歧長(zhǎng)度,使性能得到改善。 最后,由實(shí)際偏移處理測(cè)試結(jié)果可以知道,合理的數(shù)據(jù)通路規(guī)劃帶來(lái)的加速效果最為顯著,對(duì)熱點(diǎn)GPU內(nèi)核的深入優(yōu)化同樣可以帶來(lái)一定的改進(jìn),而SIMD分支優(yōu)化對(duì)偏移提速的貢獻(xiàn)相對(duì)較小。
[Abstract]:Pre-stack depth migration based on wave equation can achieve high quality migration imaging of complex geological blocks, which is an important means to find oil and gas. However, CPU-GPU heterogeneous cluster has great advantages in performance, power consumption, cost, heat dissipation and so on, which brings an opportunity for the popularization of prestack depth migration. However, CPU-GPU isomerism is very different from the traditional CPU model in system composition, architecture, programming model and so on. The efficient use of heterogeneous computing resources is faced with many problems and challenges. In this paper, the effects of non-uniform access and bus competition are qualitatively analyzed and quantitatively tested. The results show that unreasonable data paths and bus competition and saturation will have a significant impact on communication performance. It may be the bottleneck of I / O frequent offset processing. Then, several strategies to avoid bottleneck are discussed, and the experiments are carried out by combining the numerical calculation methods commonly used in migration processing. The performance and stability of the optimized application are improved. In order to fully exploit the computing potential of GPU, this paper analyzes the CUDA model, and thinks that the view of multithreaded SIMD processor is more helpful to grasp the essence of GPU and develop efficient applications. For the Fermi architecture, some characteristics of the microarchitecture are detected by microbenchmark, which provides the support for the depth performance optimization. Considering the wide application of fast Fourier transform in migration processing, based on the Fermi microarchitecture, the optimized GPU fast Fourier transform routine is analyzed in depth, and the data prefetching and instruction adjusting are used. Improved instruction-level parallelism, although thread size has declined, but the performance is still improved 12. Aiming at the problem that branch bifurcation of SIMD can result in a significant degradation of performance, this paper proposes two software level optimization strategies, "aggregation" and "extraction". The test results show that "aggregation" can increase the proportion of effective results for each step of SIMD execution, and "extract" can reduce the bifurcation length of SIMD and improve the performance. Finally, from the test results of actual migration processing, we can know that the acceleration effect brought by reasonable data path planning is the most remarkable, and the deep optimization of the hot GPU kernel can also bring some improvement. The contribution of SIMD branch optimization to migration speed increase is relatively small.
【學(xué)位授予單位】:中國(guó)石油大學(xué)(華東)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類號(hào)】:P631.44;TP332
本文編號(hào):2429761
[Abstract]:Pre-stack depth migration based on wave equation can achieve high quality migration imaging of complex geological blocks, which is an important means to find oil and gas. However, CPU-GPU heterogeneous cluster has great advantages in performance, power consumption, cost, heat dissipation and so on, which brings an opportunity for the popularization of prestack depth migration. However, CPU-GPU isomerism is very different from the traditional CPU model in system composition, architecture, programming model and so on. The efficient use of heterogeneous computing resources is faced with many problems and challenges. In this paper, the effects of non-uniform access and bus competition are qualitatively analyzed and quantitatively tested. The results show that unreasonable data paths and bus competition and saturation will have a significant impact on communication performance. It may be the bottleneck of I / O frequent offset processing. Then, several strategies to avoid bottleneck are discussed, and the experiments are carried out by combining the numerical calculation methods commonly used in migration processing. The performance and stability of the optimized application are improved. In order to fully exploit the computing potential of GPU, this paper analyzes the CUDA model, and thinks that the view of multithreaded SIMD processor is more helpful to grasp the essence of GPU and develop efficient applications. For the Fermi architecture, some characteristics of the microarchitecture are detected by microbenchmark, which provides the support for the depth performance optimization. Considering the wide application of fast Fourier transform in migration processing, based on the Fermi microarchitecture, the optimized GPU fast Fourier transform routine is analyzed in depth, and the data prefetching and instruction adjusting are used. Improved instruction-level parallelism, although thread size has declined, but the performance is still improved 12. Aiming at the problem that branch bifurcation of SIMD can result in a significant degradation of performance, this paper proposes two software level optimization strategies, "aggregation" and "extraction". The test results show that "aggregation" can increase the proportion of effective results for each step of SIMD execution, and "extract" can reduce the bifurcation length of SIMD and improve the performance. Finally, from the test results of actual migration processing, we can know that the acceleration effect brought by reasonable data path planning is the most remarkable, and the deep optimization of the hot GPU kernel can also bring some improvement. The contribution of SIMD branch optimization to migration speed increase is relatively small.
【學(xué)位授予單位】:中國(guó)石油大學(xué)(華東)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類號(hào)】:P631.44;TP332
【參考文獻(xiàn)】
相關(guān)期刊論文 前6條
1 劉紅偉;李博;劉洪;佟小龍;劉欽;;地震疊前逆時(shí)偏移高階有限差分算法及GPU實(shí)現(xiàn)[J];地球物理學(xué)報(bào);2010年07期
2 王握文;陳明;;“天河一號(hào)”超級(jí)計(jì)算機(jī)系統(tǒng)研制[J];國(guó)防科技;2009年06期
3 石穎;陸加敏;柯璇;田東升;王菲;;基于GPU并行加速的疊前逆時(shí)偏移方法[J];東北石油大學(xué)學(xué)報(bào);2012年04期
4 劉偉峰;趙改善;孔祥寧;蔡杰雄;張兵;;基于多GPU的三維Kirchhoff積分法體偏移[J];華中科技大學(xué)學(xué)報(bào)(自然科學(xué)版);2011年S1期
5 張兵;趙改善;黃駿;李敏;劉偉峰;;地震疊前深度偏移在CUDA平臺(tái)上的實(shí)現(xiàn)[J];勘探地球物理進(jìn)展;2008年06期
6 張向陽(yáng);馮超敏;文玲;;GPU加速逆時(shí)偏移技術(shù)的應(yīng)用和分析[J];計(jì)算機(jī)應(yīng)用與軟件;2012年08期
本文編號(hào):2429761
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2429761.html
最近更新
教材專著