天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

非電離平衡天文數(shù)值模擬的性能優(yōu)化

發(fā)布時間:2018-05-29 22:10

  本文選題:天文模擬 + 非電離平衡 ; 參考:《天津大學》2016年博士論文


【摘要】:計算密集、耗時長是現(xiàn)代天文數(shù)值模擬的主要特點。提高模擬計算的性能,減少計算資源的消耗,在精度和性能之間取得一個最佳的平衡點,一直是天文數(shù)值模擬軟件設計的關鍵目標。同時,建立完整的有效的非電離平衡(NEI)狀態(tài)下的輻射流體數(shù)值模擬一直是天文數(shù)值模擬中的難題之一,然而傳統(tǒng)的基于歐拉網格的非電離平衡求解過程在內存、計算和網絡通訊上都帶來了巨大的開銷。本文在總結以前研究工作的基礎上,從非電離平衡模擬過程中制約性能的幾個關鍵因素入手,針對多核異構體系和大規(guī)模并行環(huán)境,分別從工作流程、框架結構、數(shù)值求解三個層面對非電離平衡模擬進行了性能優(yōu)化。首先,本文分析并驗證了傳統(tǒng)算法的性能瓶頸,通過引入示蹤粒子將底層的自適應網格與上層的非電離平衡計算解耦,然后基于MapReduce模型重新設計了非電離平衡的并行求解框架。同時針對新框架中伴隨而來的大量粒子的快照生成、保存以及軌跡重建等問題,設計了串行I/O、并行I/O、直接I/O以及實時的流處理模式,使其能夠適應不同的計算環(huán)境和具體要求。實驗表明,框架結構層次上的優(yōu)化克服了非電離平衡模擬在大規(guī)模并行時的性能瓶頸,在相同的實驗環(huán)境下,僅用原來1/4的計算資源,就取得了3倍以上的性能提升。其次,為了突破傳統(tǒng)CPU結構在求解大量非電離平衡方程時的性能制約,本文繼續(xù)提出了基于多CPU-多GPU混合異構體系的非電離平衡求解器。算法設計上,通過使用基于共享內存和任務隊列的任務調度策略,最大限度地發(fā)揮了CPU和GPU各自的優(yōu)勢,提高了整體的資源利用率,同時根據(jù)CUDA編程模型的特點,在算法的數(shù)據(jù)結構、任務粒度以及內存訪問等方面進行了專門的優(yōu)化。測試結果顯示基于多核異構的求解器顯著地提高了非電離平衡方程的求解效率,在4塊GPU設備的情況下,加速比達到了15左右。最后,本文利用可視分析和駕馭式計算技術來優(yōu)化天文模擬的工作流程;谧赃m應網格的層級結構,利用快速的低精度的組合分析來指導耗時的高精度的模擬計算。同時又根據(jù)天文數(shù)值模擬的特點,設計了參數(shù)分類和調整接口,進而幫助天文學家高效準確地把握并控制數(shù)值模擬過程。本文提出的可視化駕馭計算環(huán)境有效的加速了非電離平衡模擬生命周期中的模型建立、電離狀態(tài)分析等過程,并在參數(shù)調整、數(shù)值誤差控制等方面輔助用戶決策。文中所有實驗都是基于實際的天文數(shù)值模擬,文中還對所有實驗結果的精度進行了詳細的對比分析。此外,本文對上述各類方法在其他問題的適應性上也進行了詳細的分析和驗證,相關實驗顯示本文的方法同樣能夠大幅度提升核合成、光譜計算等常見天文計算的性能,以及加速星風模型的探索和確立過程。
[Abstract]:The main characteristics of modern astronomical numerical simulation are dense computation and long time consuming. Improving the performance of simulation, reducing the consumption of computing resources and achieving an optimal balance between accuracy and performance are the key objectives of the design of astronomical and numerical simulation software. At the same time, it has been one of the difficult problems in astronomical numerical simulation to establish a complete and effective numerical simulation of radiation fluid under the condition of non-ionization equilibrium (NEI). However, the traditional non-ionization equilibrium solution based on Eulerian grid is in memory. Computing and network communication are costly. On the basis of summarizing the previous research work, this paper starts with several key factors that restrict the performance of the simulation process of non-ionization equilibrium, aiming at the multi-core isomerism system and large-scale parallel environment, respectively from the workflow, the framework structure, The performance of non-ionization equilibrium simulation is optimized at three levels. Firstly, this paper analyzes and verifies the performance bottleneck of the traditional algorithm, decouples the underlying adaptive mesh from the upper non-ionization equilibrium calculation by introducing tracer particles, and then redesigns the parallel solution framework of non-ionization equilibrium based on MapReduce model. At the same time, aiming at the problems of snapshot generation, preservation and trajectory reconstruction of a large number of particles accompanying in the new framework, serial I / O, parallel I / O, direct I / O and real-time stream processing modes are designed to adapt them to different computing environments and specific requirements. The experimental results show that the optimization of the frame structure level overcomes the performance bottleneck of the non-ionization equilibrium simulation in large-scale parallelism. In the same experimental environment, using only one quarter of the original computing resources, the performance is improved by more than three times. Secondly, in order to break through the performance constraints of traditional CPU structure in solving a large number of non-ionization equilibrium equations, this paper proposes a non-ionization equilibrium solver based on multi-CPU-multi-GPU mixed isomerization system. In algorithm design, by using the task scheduling strategy based on shared memory and task queue, the advantages of CPU and GPU are maximized, and the overall resource utilization is improved. At the same time, according to the characteristics of CUDA programming model, The algorithm is optimized in data structure, task granularity and memory access. The test results show that the efficiency of solving the nonionization equilibrium equation is significantly improved by the multi-core heterogeneous solver. The speedup ratio is about 15 in the case of 4 GPU devices. Finally, visual analysis and steering computing techniques are used to optimize the work flow of astronomical simulation. Based on the hierarchical structure of adaptive meshes, fast and low-precision combinatorial analysis is used to guide time-consuming and high-precision simulation. At the same time, according to the characteristics of astronomical numerical simulation, a parameter classification and adjustment interface is designed to help astronomers grasp and control the numerical simulation process efficiently and accurately. The visual steering computing environment proposed in this paper effectively speeds up the process of modeling and ionization state analysis in the life cycle of non-ionization equilibrium simulation, and assists users in parameter adjustment, numerical error control and so on. All experiments in this paper are based on actual astronomical numerical simulation, and the accuracy of all experimental results is compared and analyzed in detail. In addition, the adaptability of the above methods to other problems is also analyzed and verified in detail. The experimental results show that the proposed method can also greatly improve the performance of common astronomical calculations, such as nuclear synthesis, spectral calculation, and so on. And to accelerate the exploration and establishment of the star wind model.
【學位授予單位】:天津大學
【學位級別】:博士
【學位授予年份】:2016
【分類號】:P11

【參考文獻】

相關期刊論文 前1條

1 羅力;楊超;趙宇波;蔡小川;;CPU/GPU集群上求解偏微分方程的可擴展混合算法[J];集成技術;2012年01期



本文編號:1952626

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/shoufeilunwen/jckxbs/1952626.html


Copyright(c)文論論文網All Rights Reserved | 網站地圖 |

版權申明:資料由用戶a165e***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com