多核集群上的混合并行分子動力學(xué)計(jì)算研究

發(fā)布時間：2018-02-03 23:58

本文關(guān)鍵詞： 混合編程模型多核集群分子動力學(xué) MPI OpenMP　出處：《電子科技大學(xué)》2012年博士論文　論文類型：學(xué)位論文

【摘要】：隨著高性能計(jì)算機(jī)的快速發(fā)展和計(jì)算資源的日益豐富，高性能計(jì)算已成為當(dāng)今國內(nèi)外研究的熱點(diǎn)。由于高性能計(jì)算機(jī)的主流結(jié)構(gòu)已從大規(guī)模并行處理機(jī)轉(zhuǎn)向多核集群，系統(tǒng)也從單一內(nèi)存模型轉(zhuǎn)向混合內(nèi)存模型，為高性能計(jì)算機(jī)所設(shè)計(jì)的并行程序必須適應(yīng)這一轉(zhuǎn)變，從而產(chǎn)生了混合并行編程模型。分子動力學(xué)（Molecular Dynamics，MD）模擬作為一種重要的科學(xué)研究方法，在多個學(xué)科領(lǐng)域里得到了廣泛地應(yīng)用。進(jìn)一步加快MD模擬在多核集群上的計(jì)算速度，促進(jìn)這些領(lǐng)域的科研工作進(jìn)一步發(fā)展就變得非常緊迫。然而，當(dāng)人們在設(shè)計(jì)多核集群上的基于混合并行編程模型的并行MD算法以及其它并行算法時，普遍遇到引入多線程并行時開銷過高的問題，使混合模型常常不如原來的純消息傳遞模型。因此，如何解決這類問題，提高科學(xué)與工程計(jì)算程序在多核集群上的計(jì)算速度，，是當(dāng)前研究的一個重要方向。本文全面系統(tǒng)地研究混合并行編程模型、混合并行MD算法的研究現(xiàn)狀和存在的不足，在此基礎(chǔ)上提出了一系列相關(guān)問題的優(yōu)化或改進(jìn)算法。本文的主要內(nèi)容及創(chuàng)新點(diǎn)如下： (1)本文深入地分析了適用于多核集群的混合并行編程模型、并行MD算法的基本原理和基本實(shí)現(xiàn)方法，為后面提出的多核集群上的混合并行MD算法打下了基礎(chǔ)。 (2)本文論證了Critical Section算法進(jìn)行多線程并行MD計(jì)算的可擴(kuò)展性問題，理論分析和實(shí)驗(yàn)結(jié)果表明，Critical Section算法在處理器核心數(shù)量大于8時的加速比明顯下降。本文進(jìn)而提出了一個稱為三角形并行MD算法的優(yōu)化方法，該方法通過靜態(tài)分配原子集的策略讓各線程在不同的時刻進(jìn)入臨界區(qū)，從而減少臨界區(qū)的閑置時間，加快并行計(jì)算速度。 (3)本文提出了基于OpenMP的并行MD算法——SPMD-like（Single ProgramMultiple Data）算法。該算法采用與SPMD程序相同的各自處理數(shù)據(jù)并冗余計(jì)算跨區(qū)域數(shù)據(jù)關(guān)系的策略，但是在實(shí)現(xiàn)上卻接近簡單的OpenMP實(shí)現(xiàn)，不需要修改MD的內(nèi)部計(jì)算邏輯，只需要修改幾個數(shù)據(jù)結(jié)構(gòu)并添加一個空間分解子程序。該算法在保持OpenMP實(shí)現(xiàn)簡單特點(diǎn)的同時取得接近純消息模型的并行計(jì)算性能和可擴(kuò)展性。 (4)本文提出了一種多核集群上的基于混合MPI/OpenMP模型的并行MD算法。該算法在保持盡量小修改原則的基礎(chǔ)上，將SPMD-like算法嵌入純MPI并行MD程序中。該混合并行程序在節(jié)點(diǎn)內(nèi)采用OpenMP并行，在引入較小并行開銷的同時，明顯地減少了節(jié)點(diǎn)間的通信時間，從而有效地提高了MD程序在多核集群上的計(jì)算速度和并行效率。 (5)本文提出了一種完全避免臨界區(qū)的歸約算法——分塊輪換歸約算法，該算法在保持與Critical Section算法相似的簡單性的同時，具有比Critical Section算法更好的并行性能和可擴(kuò)展性。理論分析和實(shí)驗(yàn)測試證明該算法在節(jié)點(diǎn)內(nèi)處理器核數(shù)為16時并行性能較好，但是達(dá)到32以及更大時，它的性能不如SPMD-like算法。因此它和SPMD-like算法分別適合于不同的混合并行場合：節(jié)點(diǎn)內(nèi)處理器核數(shù)量不多時，可選擇實(shí)現(xiàn)較簡單的分塊輪換歸約法；處理器核數(shù)量較多時可采用性能更好的SPMD-like算法。 (6)本文提出了一種基于混合MPI/TBB模型的并行MD算法，并以LAMMPS為例進(jìn)行了它的實(shí)現(xiàn)研究。實(shí)驗(yàn)測試結(jié)果表明，當(dāng)多核集群中參與計(jì)算的節(jié)點(diǎn)數(shù)增加到一定程度后，混合模型可以獲得比純MPI模型更好的并行性能，且主要原因是通信時間的減少。
[Abstract]:With the rapid development of high - performance computers and the increasingly abundant computing resources , high - performance computing has become a hot topic at home and abroad . As the mainstream structure of high - performance computers has shifted from a large - scale parallel processing machine to a multi - core cluster , a parallel program designed by a high - performance computer has been widely used . In this paper , a systematic study of the mixed parallel programming model , the research status and the shortcomings of the hybrid parallel MD algorithm are studied systematically . Based on this , a series of optimization or improved algorithms are proposed . The main content and innovation points of this paper are as follows : ( 1 ) This paper deeply analyzes the mixed parallel programming model applicable to multi - core cluster , the basic principle and realization method of parallel MD algorithm , which lays a foundation for the hybrid parallel MD algorithm on multi - core cluster . ( 2 ) In this paper , the scalability problem of multi - thread parallel MD computation is demonstrated by Critical Section algorithm . The theoretical analysis and experimental results show that the critical section algorithm decreases significantly when the number of processor cores is greater than 8 . This paper further proposes an optimization method called triangle parallel MD algorithm . This method allows each thread to enter the critical area at different times by statically assigned atom set strategy , thus reducing the idle time of the critical area and speeding up the parallel computing speed . ( 3 ) In this paper , a parallel MD algorithm _ SPMD - like ( Single Program Multiple Data ) algorithm is proposed , which uses the same processing data as SPMD program and computes the cross - region data relationship . However , it is close to the implementation of the simple program . There is no need to modify the internal calculation logic of MD . It is only necessary to modify several data structures and add a spatial decomposition subroutine . ( 4 ) In this paper , a parallel MD algorithm is proposed based on the hybrid MPI - like model on a multi - core cluster . The algorithm is based on the principle of small modification , and the SPMD - like algorithm is embedded in a pure MPI parallel MD program . In the node , the hybrid parallel program is used in parallel , and the communication time between the nodes is obviously reduced while the smaller parallel overhead is introduced , thereby effectively improving the computing speed and the parallel efficiency of the MD program on the multi - core cluster . ( 5 ) In this paper , a reduction algorithm _ block rotation reduction algorithm is proposed to completely avoid the critical section . The algorithm has better parallel performance and scalability than the Critical Section algorithm while maintaining the similarity to the Critical Section algorithm . The theoretical analysis and experimental tests prove that the algorithm is better in parallel performance than the SPMD - like algorithm when the number of processors in the node is 16 . Therefore , it is better than the SPMD - like algorithm when the number of processors in the node is high . ( 6 ) A parallel MD algorithm based on mixed MPI / TBB model is presented in this paper , and its implementation is studied with LAMMPS . The experimental results show that when the number of nodes participating in the multi - core cluster increases to a certain degree , the hybrid model can obtain better parallel performance than pure MPI model , and the main reason is the reduction of communication time .

【學(xué)位授予單位】：電子科技大學(xué)
【學(xué)位級別】：博士
【學(xué)位授予年份】：2012
【分類號】：TP338

【參考文獻(xiàn)】

相關(guān)期刊論文前7條

1 王慶先;孫世新;尚明生;劉宴兵;;并行計(jì)算模型研究[J];計(jì)算機(jī)科學(xué);2004年09期

2 陳國良;孫廣中;徐云;呂敏;;并行算法研究方法學(xué)[J];計(jì)算機(jī)學(xué)報;2008年09期

3 白明澤;程麗;豆育升;孫世新;;基于OpenMP的分子動力學(xué)并行算法的性能分析與優(yōu)化[J];計(jì)算機(jī)應(yīng)用;2012年01期

4 單瑩;吳建平;王正華;;基于SMP集群的多層次并行編程模型與并行優(yōu)化技術(shù)[J];計(jì)算機(jī)應(yīng)用研究;2006年10期

5 潘衛(wèi);陳燎原;張錦華;李永革;潘莉;夏凡;;基于SMP集群的MPI+OpenMP混合編程模型研究[J];計(jì)算機(jī)應(yīng)用研究;2009年12期

6 趙永華,遲學(xué)斌;基于SMP集群的MPI+OpenMP混合編程模型及有效實(shí)現(xiàn)[J];微電子學(xué)與計(jì)算機(jī);2005年10期

7 陳國良;苗乾坤;孫廣中;徐云;鄭啟龍;;分層并行計(jì)算模型[J];中國科學(xué)技術(shù)大學(xué)學(xué)報;2008年07期

本文編號：1488715

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1488715.html

上一篇：嵌入式系統(tǒng)人機(jī)交互界面開發(fā)平臺研究
下一篇：氧活化測井儀地面系統(tǒng)的接口總線設(shè)計(jì)

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

多核集群上的混合并行分子動力學(xué)計(jì)算研究