大規(guī)模并行計算通信可擴展性—分析、優(yōu)化與模擬
發(fā)布時間:2018-04-22 20:31
本文選題:并行計算 + 通信可擴展性 ; 參考:《國防科學(xué)技術(shù)大學(xué)》2013年博士論文
【摘要】:隨著系統(tǒng)規(guī)模的擴大和結(jié)點計算能力的提高,通信已經(jīng)成為制約并行計算可擴展性的重要瓶頸。通信可擴展性問題,即分析通信受何種因素影響并且該影響增大到何種程度會限制系統(tǒng)的可擴展性,是并行計算領(lǐng)域最具挑戰(zhàn)性的理論問題之一。 本文針對通信可擴展性問題,首次從性能加速比的角度量化了并行計算的通信墻,并建立了通信可擴展性模型。基于通信可擴展性模型的分析結(jié)論,本文分別針對程序優(yōu)化和任務(wù)分配優(yōu)化,提出了消息獨立性指導(dǎo)下的程序優(yōu)化技術(shù)和面向多作業(yè)的分配優(yōu)化技術(shù)。最后,設(shè)計和實現(xiàn)了一款針對大規(guī)模并行計算的性能預(yù)測模擬器,該模擬器可用于驗證通信可擴展性模型的正確性以及并行系統(tǒng)的各種相關(guān)優(yōu)化技術(shù)的可擴展性。 具體而言,本文的主要工作和創(chuàng)新點體現(xiàn)在: 1.建立了通信可擴展性模型(第二章) 目前,國際上對于通信可擴展性問題大多是感性上的認(rèn)識,并未對其進(jìn)行系統(tǒng)的定量研究。本文首次提出了通信墻的定量化描述,給出了通信墻存在性定理。由此,本文建立了通信可擴展性模型,提出了系統(tǒng)度量方法及基于通信可擴展性模型的并行系統(tǒng)分類方法,量化了系統(tǒng)的通信可擴展性強弱和廣義通信可擴展性強弱。最后結(jié)合具體案例,分析了程序、并行機拓?fù)湟约俺R妰?yōu)化方法對通信可擴展性的影響,比較了常見的巨型機拓?fù)涞膹V義通信可擴展性強弱,指出優(yōu)化系統(tǒng)通信可擴展性和廣義通信可擴展性的方向。 2.提出了消息獨立性指導(dǎo)下的程序優(yōu)化技術(shù)(第三章)基于指令重排的通信隱藏技術(shù)是優(yōu)化程序性能的主要手段之一,然而除去該技術(shù)自身面臨的問題,它還會導(dǎo)致消息間產(chǎn)生嚴(yán)重的網(wǎng)絡(luò)資源競爭。本文通過分析網(wǎng)絡(luò)資源競爭的產(chǎn)生原因,首次提出了消息獨立性的概念并研究了其具體涵義;然后針對MPI(Message Passing Interface)程序,建立了基于指令重排的消息獨立性指導(dǎo)下的程序優(yōu)化模型;基于上述優(yōu)化模型,設(shè)計并實現(xiàn)了基于指令重排的消息獨立性指導(dǎo)下的程序優(yōu)化方法,,該方法可以在保證通信隱藏最大化的前提下減少消息間的網(wǎng)絡(luò)資源競爭;針對并行CFD(Computational Fluid Dynamics)應(yīng)用的實驗表明,該方法能夠很好的減少程序的通信開銷并提升程序的性能。 3.提出了面向多作業(yè)的分配優(yōu)化技術(shù)(第四章)合理地為多個作業(yè)分配計算資源以滿足作業(yè)的性能需求,對于那些使用大規(guī)模并行計算系統(tǒng)的用戶來說十分重要。本文首次提出將多作業(yè)分配優(yōu)化問題分解為多作業(yè)分布優(yōu)化和單作業(yè)任務(wù)映射優(yōu)化兩個子問題。針對多作業(yè)分布優(yōu)化問題,本文首次提出閉合最小圖劃分模型,將多作業(yè)分布優(yōu)化問題轉(zhuǎn)化為閉合最小圖劃分問題;針對單作業(yè)任務(wù)映射優(yōu)化問題,本文分析了通信協(xié)議對通信開銷的影響,首次為MPI程序提出了協(xié)議感知的進(jìn)程映射模型—PaPP。基于上述兩個模型,本文設(shè)計并實現(xiàn)了面向多作業(yè)的分配優(yōu)化方法。實驗表明,對于NPB(NAS Parallel Benchmarks)測試集,面向多作業(yè)的分配優(yōu)化方法有很好的性能優(yōu)化效果。 4.設(shè)計并實現(xiàn)虛實結(jié)合的執(zhí)行驅(qū)動模擬器—VACED-SIM(第五章)離散事件模擬是大規(guī)模并行計算常用的性能預(yù)測方法之一。本文基于對離散事件模擬方法的深入分析,提出了虛模擬和實模擬的概念;通過對虛模擬和實模擬以及軌跡驅(qū)動和執(zhí)行驅(qū)動方法的對比,首次從兩個正交的角度(模擬機制和事件驅(qū)動方法)將基于離散事件模擬的性能預(yù)測方法分為四類;針對大規(guī)模并行計算可擴展性預(yù)測的特點,首次提出了第四類模擬方法—虛實結(jié)合執(zhí)行驅(qū)動(VACED)模擬方法的模型;谠撃P停疚脑O(shè)計和實現(xiàn)了一款輕量級的虛實結(jié)合執(zhí)行驅(qū)動模擬器—VACED-SIM。在該模擬器中,本文首次提出并采用了細(xì)粒度的活動和事件定義方法,從而提高模擬的精度。在Tianhe-1A子系統(tǒng)上的實驗結(jié)果表明,VACED-SIM具有很高的準(zhǔn)確性與效率。
[Abstract]:With the expansion of the scale of the system and the improvement of the computing power of nodes, communication has become an important bottleneck to restrict the scalability of parallel computing. Communication scalability is the most challenging theoretical question in the field of parallel computing, which is to analyze what factors affect communication and to what extent the influence will limit the scalability of the system. One of the questions.
Aiming at the problem of communication scalability, this paper first quantifies the communication wall of parallel computing from the angle of performance acceleration ratio, and establishes a communication extensibility model. Based on the analysis conclusion of the communication scalability model, this paper puts forward the optimization technology and face of program optimization under the guidance of message independence for program optimization and task allocation optimization respectively. In the end, a performance prediction simulator for large-scale parallel computing is designed and implemented. The simulator can be used to verify the correctness of the communication scalability model and the scalability of the various related optimization techniques of the parallel system.
Specifically, the main work and innovation of this paper are as follows:
1. the communication scalability model (second chapters) is established.
At present, the problem of communication extensibility is mostly perceptual knowledge in the world, and the quantitative study of the communication wall is not carried out. In this paper, the quantitative description of the communication wall and the existence theorem of the communication wall are presented for the first time. Therefore, the communication extensibility model is established in this paper, and the system measurement method and the communication scalability based on the communication are proposed. The parallel system classification method of the model quantifies the scalability of the system and the extensibility of the generalized communication. Finally, it analyzes the influence of the program, the parallel machine topology and the common optimization methods on the scalability of the communication, and compares the extensibility of the common supercomputer topology with the specific cases. The scalability of communication system and the extensibility of generalized communication.
2. the program optimization technology under the guidance of message independence (third chapter) the communication hiding technology based on the rearrangement of instruction is one of the main means to optimize the performance of the program. However, it will also cause serious network resource competition between the messages. This paper analyzes the origin of network resource competition. For the first time, the concept of message independence is proposed and its specific meaning is studied. Then a program optimization model under the guidance of message independence is established for MPI (Message Passing Interface) program. Based on the above optimization model, a program under the guidance of message independence based on instruction rearrangement is designed and implemented. The optimization method can reduce the network resource competition between messages under the premise of guaranteeing the maximization of communication hiding. The experiment for parallel CFD (Computational Fluid Dynamics) application shows that this method can reduce the communication overhead of the program well and improve the performance of the program.
3. a multi job oriented allocation optimization technique (fourth chapter) is proposed to rationally allocate computing resources for multiple jobs to meet the performance requirements of the job. It is very important for users to use large scale parallel computing systems. This paper first proposes to decompose the multi assignment optimization questions into multi job distribution optimization and single job assignment for the first time. For the first time, two sub problems are proposed. For the problem of multi job distribution optimization, the closed minimum graph partition model is proposed for the first time. The problem of multi job distribution optimization is transformed into a closed minimum graph partition problem. In this paper, the influence of communication protocol on communication overhead is analyzed for the optimization problem of single task task mapping, and the MPI program is first proposed for the first time. Protocol aware process mapping model - PaPP. based on the above two models, this paper designs and implements a multi job oriented allocation optimization method. The experiment shows that for the NPB (NAS Parallel Benchmarks) test set, the optimization method for multi job assignment optimization has a good performance optimization effect.
4. VACED-SIM (fifth chapter) discrete event simulation is one of the common performance prediction methods used in large-scale parallel computing. Based on the in-depth analysis of the discrete event simulation method, the concept of virtual simulation and real simulation is proposed in this paper, and the virtual simulation, real simulation and trajectory driving are adopted. Compared with the execution drive method, the performance prediction methods based on discrete event simulation are divided into four classes for the first time from two orthogonal angles (simulation mechanism and event driven method). For the first time, the fourth analog square method, virtual reality combined execution drive (VACED) simulation method, is proposed for the characteristics of the scalability prediction of large-scale parallel computing. Based on this model, this paper designs and implements a lightweight virtual reality combined execution drive simulator - VACED-SIM., in this simulator, first proposed and adopted a fine-grained activity and event definition method to improve the accuracy of the simulation. The experimental results on the Tianhe-1A subsystem show that the VACED-SIM is very high. Accuracy and efficiency.
【學(xué)位授予單位】:國防科學(xué)技術(shù)大學(xué)
【學(xué)位級別】:博士
【學(xué)位授予年份】:2013
【分類號】:TP338.6
【參考文獻(xiàn)】
相關(guān)期刊論文 前2條
1 遲利華,劉杰,胡慶豐;數(shù)值并行計算可擴展性評價與測試[J];計算機研究與發(fā)展;2005年06期
2 王軍委;趙榮彩;李妍;;基于Define-Use分析的冗余通信消除算法[J];計算機工程;2009年04期
相關(guān)博士學(xué)位論文 前1條
1 謝e
本文編號:1788811
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1788811.html
最近更新
教材專著