多核集群上的混合并行分子動(dòng)力學(xué)計(jì)算研究
本文關(guān)鍵詞: 混合編程模型 多核集群 分子動(dòng)力學(xué) MPI OpenMP 出處:《電子科技大學(xué)》2012年博士論文 論文類型:學(xué)位論文
【摘要】:隨著高性能計(jì)算機(jī)的快速發(fā)展和計(jì)算資源的日益豐富,高性能計(jì)算已成為當(dāng)今國(guó)內(nèi)外研究的熱點(diǎn)。由于高性能計(jì)算機(jī)的主流結(jié)構(gòu)已從大規(guī)模并行處理機(jī)轉(zhuǎn)向多核集群,系統(tǒng)也從單一內(nèi)存模型轉(zhuǎn)向混合內(nèi)存模型,為高性能計(jì)算機(jī)所設(shè)計(jì)的并行程序必須適應(yīng)這一轉(zhuǎn)變,從而產(chǎn)生了混合并行編程模型。分子動(dòng)力學(xué)(Molecular Dynamics,MD)模擬作為一種重要的科學(xué)研究方法,在多個(gè)學(xué)科領(lǐng)域里得到了廣泛地應(yīng)用。進(jìn)一步加快MD模擬在多核集群上的計(jì)算速度,促進(jìn)這些領(lǐng)域的科研工作進(jìn)一步發(fā)展就變得非常緊迫。然而,當(dāng)人們?cè)谠O(shè)計(jì)多核集群上的基于混合并行編程模型的并行MD算法以及其它并行算法時(shí),普遍遇到引入多線程并行時(shí)開銷過(guò)高的問(wèn)題,使混合模型常常不如原來(lái)的純消息傳遞模型。因此,如何解決這類問(wèn)題,提高科學(xué)與工程計(jì)算程序在多核集群上的計(jì)算速度,,是當(dāng)前研究的一個(gè)重要方向。 本文全面系統(tǒng)地研究混合并行編程模型、混合并行MD算法的研究現(xiàn)狀和存在的不足,在此基礎(chǔ)上提出了一系列相關(guān)問(wèn)題的優(yōu)化或改進(jìn)算法。 本文的主要內(nèi)容及創(chuàng)新點(diǎn)如下: (1)本文深入地分析了適用于多核集群的混合并行編程模型、并行MD算法的基本原理和基本實(shí)現(xiàn)方法,為后面提出的多核集群上的混合并行MD算法打下了基礎(chǔ)。 (2)本文論證了Critical Section算法進(jìn)行多線程并行MD計(jì)算的可擴(kuò)展性問(wèn)題,理論分析和實(shí)驗(yàn)結(jié)果表明,Critical Section算法在處理器核心數(shù)量大于8時(shí)的加速比明顯下降。本文進(jìn)而提出了一個(gè)稱為三角形并行MD算法的優(yōu)化方法,該方法通過(guò)靜態(tài)分配原子集的策略讓各線程在不同的時(shí)刻進(jìn)入臨界區(qū),從而減少臨界區(qū)的閑置時(shí)間,加快并行計(jì)算速度。 (3)本文提出了基于OpenMP的并行MD算法——SPMD-like(Single ProgramMultiple Data)算法。該算法采用與SPMD程序相同的各自處理數(shù)據(jù)并冗余計(jì)算跨區(qū)域數(shù)據(jù)關(guān)系的策略,但是在實(shí)現(xiàn)上卻接近簡(jiǎn)單的OpenMP實(shí)現(xiàn),不需要修改MD的內(nèi)部計(jì)算邏輯,只需要修改幾個(gè)數(shù)據(jù)結(jié)構(gòu)并添加一個(gè)空間分解子程序。該算法在保持OpenMP實(shí)現(xiàn)簡(jiǎn)單特點(diǎn)的同時(shí)取得接近純消息模型的并行計(jì)算性能和可擴(kuò)展性。 (4)本文提出了一種多核集群上的基于混合MPI/OpenMP模型的并行MD算法。該算法在保持盡量小修改原則的基礎(chǔ)上,將SPMD-like算法嵌入純MPI并行MD程序中。該混合并行程序在節(jié)點(diǎn)內(nèi)采用OpenMP并行,在引入較小并行開銷的同時(shí),明顯地減少了節(jié)點(diǎn)間的通信時(shí)間,從而有效地提高了MD程序在多核集群上的計(jì)算速度和并行效率。 (5)本文提出了一種完全避免臨界區(qū)的歸約算法——分塊輪換歸約算法,該算法在保持與Critical Section算法相似的簡(jiǎn)單性的同時(shí),具有比Critical Section算法更好的并行性能和可擴(kuò)展性。理論分析和實(shí)驗(yàn)測(cè)試證明該算法在節(jié)點(diǎn)內(nèi)處理器核數(shù)為16時(shí)并行性能較好,但是達(dá)到32以及更大時(shí),它的性能不如SPMD-like算法。因此它和SPMD-like算法分別適合于不同的混合并行場(chǎng)合:節(jié)點(diǎn)內(nèi)處理器核數(shù)量不多時(shí),可選擇實(shí)現(xiàn)較簡(jiǎn)單的分塊輪換歸約法;處理器核數(shù)量較多時(shí)可采用性能更好的SPMD-like算法。 (6)本文提出了一種基于混合MPI/TBB模型的并行MD算法,并以LAMMPS為例進(jìn)行了它的實(shí)現(xiàn)研究。實(shí)驗(yàn)測(cè)試結(jié)果表明,當(dāng)多核集群中參與計(jì)算的節(jié)點(diǎn)數(shù)增加到一定程度后,混合模型可以獲得比純MPI模型更好的并行性能,且主要原因是通信時(shí)間的減少。
[Abstract]:With the rapid development of high - performance computers and the increasingly abundant computing resources , high - performance computing has become a hot topic at home and abroad . As the mainstream structure of high - performance computers has shifted from a large - scale parallel processing machine to a multi - core cluster , a parallel program designed by a high - performance computer has been widely used . In this paper , a systematic study of the mixed parallel programming model , the research status and the shortcomings of the hybrid parallel MD algorithm are studied systematically . Based on this , a series of optimization or improved algorithms are proposed . The main content and innovation points of this paper are as follows : ( 1 ) This paper deeply analyzes the mixed parallel programming model applicable to multi - core cluster , the basic principle and realization method of parallel MD algorithm , which lays a foundation for the hybrid parallel MD algorithm on multi - core cluster . ( 2 ) In this paper , the scalability problem of multi - thread parallel MD computation is demonstrated by Critical Section algorithm . The theoretical analysis and experimental results show that the critical section algorithm decreases significantly when the number of processor cores is greater than 8 . This paper further proposes an optimization method called triangle parallel MD algorithm . This method allows each thread to enter the critical area at different times by statically assigned atom set strategy , thus reducing the idle time of the critical area and speeding up the parallel computing speed . ( 3 ) In this paper , a parallel MD algorithm _ SPMD - like ( Single Program Multiple Data ) algorithm is proposed , which uses the same processing data as SPMD program and computes the cross - region data relationship . However , it is close to the implementation of the simple program . There is no need to modify the internal calculation logic of MD . It is only necessary to modify several data structures and add a spatial decomposition subroutine . ( 4 ) In this paper , a parallel MD algorithm is proposed based on the hybrid MPI - like model on a multi - core cluster . The algorithm is based on the principle of small modification , and the SPMD - like algorithm is embedded in a pure MPI parallel MD program . In the node , the hybrid parallel program is used in parallel , and the communication time between the nodes is obviously reduced while the smaller parallel overhead is introduced , thereby effectively improving the computing speed and the parallel efficiency of the MD program on the multi - core cluster . ( 5 ) In this paper , a reduction algorithm _ block rotation reduction algorithm is proposed to completely avoid the critical section . The algorithm has better parallel performance and scalability than the Critical Section algorithm while maintaining the similarity to the Critical Section algorithm . The theoretical analysis and experimental tests prove that the algorithm is better in parallel performance than the SPMD - like algorithm when the number of processors in the node is 16 . Therefore , it is better than the SPMD - like algorithm when the number of processors in the node is high . ( 6 ) A parallel MD algorithm based on mixed MPI / TBB model is presented in this paper , and its implementation is studied with LAMMPS . The experimental results show that when the number of nodes participating in the multi - core cluster increases to a certain degree , the hybrid model can obtain better parallel performance than pure MPI model , and the main reason is the reduction of communication time .
【學(xué)位授予單位】:電子科技大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2012
【分類號(hào)】:TP338
【參考文獻(xiàn)】
相關(guān)期刊論文 前7條
1 王慶先;孫世新;尚明生;劉宴兵;;并行計(jì)算模型研究[J];計(jì)算機(jī)科學(xué);2004年09期
2 陳國(guó)良;孫廣中;徐云;呂敏;;并行算法研究方法學(xué)[J];計(jì)算機(jī)學(xué)報(bào);2008年09期
3 白明澤;程麗;豆育升;孫世新;;基于OpenMP的分子動(dòng)力學(xué)并行算法的性能分析與優(yōu)化[J];計(jì)算機(jī)應(yīng)用;2012年01期
4 單瑩;吳建平;王正華;;基于SMP集群的多層次并行編程模型與并行優(yōu)化技術(shù)[J];計(jì)算機(jī)應(yīng)用研究;2006年10期
5 潘衛(wèi);陳燎原;張錦華;李永革;潘莉;夏凡;;基于SMP集群的MPI+OpenMP混合編程模型研究[J];計(jì)算機(jī)應(yīng)用研究;2009年12期
6 趙永華,遲學(xué)斌;基于SMP集群的MPI+OpenMP混合編程模型及有效實(shí)現(xiàn)[J];微電子學(xué)與計(jì)算機(jī);2005年10期
7 陳國(guó)良;苗乾坤;孫廣中;徐云;鄭啟龍;;分層并行計(jì)算模型[J];中國(guó)科學(xué)技術(shù)大學(xué)學(xué)報(bào);2008年07期
本文編號(hào):1488715
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1488715.html