面向GPU集群領(lǐng)域的關(guān)鍵算法研究和實(shí)現(xiàn)
本文選題:LU-SGS + GPU; 參考:《杭州電子科技大學(xué)》2017年碩士論文
【摘要】:GPU已不僅僅只用于圖形圖像領(lǐng)域,近幾年由于其架構(gòu)特點(diǎn)以及浮點(diǎn)數(shù)計(jì)算能力的提升帶動(dòng)了整個(gè)數(shù)值計(jì)算領(lǐng)域的發(fā)展,特別是對(duì)易于并行處理的任務(wù),計(jì)算時(shí)間可以得到十幾倍甚至幾十倍的提速。在許多數(shù)值計(jì)算量大的領(lǐng)域中,GPU發(fā)揮著顯著的提速作用。在機(jī)器學(xué)習(xí)領(lǐng)域,邏輯回歸算法由于其特征維度可能達(dá)到上億,單機(jī)版的訓(xùn)練,甚至簡(jiǎn)單的并行處理也已經(jīng)無法滿足訓(xùn)練要求。最好的解決方法是將高維的特征向量拆分成若干小的向量進(jìn)行求解。并行機(jī)器學(xué)習(xí)算法的產(chǎn)生改善了當(dāng)前的局面,上千臺(tái)甚至上萬臺(tái)機(jī)器并行訓(xùn)練,從而提高運(yùn)行速度。另外,在能源領(lǐng)域,因?yàn)楹四芫哂械吞、能量密度高、高持續(xù)等特點(diǎn),具有其他領(lǐng)域無法替代的發(fā)展趨勢(shì)。而堆芯燃料的管理是核電站關(guān)心的問題之一,也直接關(guān)系到核電經(jīng)濟(jì)效益和成本。一般大型的堆芯擴(kuò)展方程階數(shù)非常大,計(jì)算方程非常復(fù)雜且運(yùn)算極其耗時(shí),因此堆芯燃料管理流程中擴(kuò)散方程的計(jì)算至關(guān)重要。本文的主要研究工作與貢獻(xiàn)如下:(1)本文以反應(yīng)堆擴(kuò)散方程為背景,抽取出非結(jié)構(gòu)化網(wǎng)格流場(chǎng)計(jì)算中LU-SGS迭代部分,對(duì)一維和二維網(wǎng)格塊進(jìn)行劃分,將網(wǎng)格劃分為多個(gè)域,每個(gè)域平均分配在GPU的線程塊上,并采用CUDA和MPI編程技術(shù)在GPU集群和CPU上對(duì)LU-SGS算法進(jìn)行并行迭代計(jì)算。實(shí)驗(yàn)表明,與串行程序相比,GPU極大的提高了程序的執(zhí)行效率,驗(yàn)證了GPU在數(shù)值計(jì)算領(lǐng)域上發(fā)揮了極大的作用。(2)本文詳細(xì)推導(dǎo)和分析了并行邏輯回歸算法,并提出了使用Raft算法替代DHT算法的理論更改參數(shù)服務(wù)器的一致性要求,著重分析了GPU對(duì)參數(shù)服務(wù)器的重要影響。(3)本文LU-SGS迭代法和機(jī)器學(xué)習(xí)領(lǐng)域中的并行邏輯回歸算法,它們都屬于并行度不高的算法。由于GPU底層block之間線程無法通信、共享內(nèi)存也不能相互訪問,因此并不能通過“碰撞”交換數(shù)據(jù)的方法進(jìn)行迭代計(jì)算,本文提出了一種“延遲”迭代的方法,每次迭代少向前或向后迭代一步。該方法可以讓LU-SGS算法執(zhí)行時(shí)間減少20%左右。
[Abstract]:GPU has not only been used in the field of graphics and images. In recent years, the development of the whole numerical computing field has been driven by its architectural characteristics and the improvement of the ability of floating-point computing, especially for the tasks that are easy to process in parallel. The computing time can be 10 times or even tens of times faster. GPU plays a significant role in accelerating speed in many fields with large numerical computation. In the field of machine learning, the logical regression algorithm can reach hundreds of millions because of its characteristic dimension. The single machine version of training, even simple parallel processing, can no longer meet the training requirements. The best solution is to divide the high-dimensional eigenvector into several small vectors to solve. The generation of parallel machine learning algorithm improves the current situation, thousands or even tens of thousands of parallel training machines, thus improving the speed of operation. In addition, in the field of energy, nuclear energy has the characteristics of low carbon, high energy density, high persistence, and has the development trend that can not be replaced by other fields. Core fuel management is one of the most important issues in nuclear power plants, and it is also directly related to the economic benefits and costs of nuclear power. Generally, the order of large core expansion equation is very large, the calculation equation is very complex and time-consuming, so the calculation of diffusion equation in core fuel management process is very important. The main work and contributions of this paper are as follows: (1) based on the reactor diffusion equation, the LU-SGS iterative part of the unstructured grid flow field is extracted, and the one-dimensional and two-dimensional grid blocks are divided into several domains. Each domain is distributed equally on the thread block of GPU, and the parallel iterative computation of LU-SGS algorithm is carried out on GPU cluster and CPU using CUDA and MPI programming techniques. Experimental results show that GPU greatly improves the efficiency of program execution compared with serial program, and verifies that GPU plays a great role in numerical computation. (2) parallel logic regression algorithm is deduced and analyzed in detail in this paper. The consistency requirement of parameter server is changed by using Raft algorithm instead of DHT algorithm, and the important influence of GPU on parameter server is analyzed. (3) LU-SGS iterative method and parallel logic regression algorithm in machine learning field are discussed in this paper. They all belong to algorithms with low degree of parallelism. Because the threads in the underlying block can not communicate and the shared memory can not be accessed to each other, it is not possible to iterate through the method of "collision" exchanging data. In this paper, a method of "delayed" iteration is proposed. Each iteration takes one step forward or backward. This method can reduce the execution time of LU-SGS algorithm by about 20%.
【學(xué)位授予單位】:杭州電子科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP391.41;TP181
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 ;釷基熔鹽堆核能系統(tǒng)[J];中國(guó)科學(xué)院院刊;2016年S1期
2 郝海兵;張強(qiáng);楊永;梁益華;;基于LU-SGS迭代的DGM隱式方法研究[J];西北工業(yè)大學(xué)學(xué)報(bào);2014年03期
3 趙信文;楊永;張強(qiáng);;預(yù)估-校正LU-SGS的隱式算法[J];航空計(jì)算技術(shù);2012年04期
4 周宇;錢煒祺;鄧有奇;馬明生;;k-ω SST兩方程湍流模型中參數(shù)影響的初步分析[J];空氣動(dòng)力學(xué)學(xué)報(bào);2010年02期
5 楊彬;汪德r;;非結(jié)構(gòu)網(wǎng)格上淺水方程的LU-SGS隱式算法[J];河海大學(xué)學(xué)報(bào)(自然科學(xué)版);2008年04期
6 趙松原,黃明恪;非結(jié)構(gòu)網(wǎng)格中LU-SGS隱式算法的非平衡性影響[J];空氣動(dòng)力學(xué)學(xué)報(bào);2004年04期
7 張來平,王志堅(jiān),張涵信;動(dòng)態(tài)混合網(wǎng)格生成及隱式非定常計(jì)算方法[J];力學(xué)學(xué)報(bào);2004年06期
8 許彥峰,孫漢旭;人工智能在機(jī)器人領(lǐng)域的開發(fā)應(yīng)用[J];機(jī)電產(chǎn)品開發(fā)與創(chuàng)新;2004年01期
9 李冬生,章宗耀,謝仲生;壓水堆核電廠堆芯燃料管理優(yōu)化研究[J];核動(dòng)力工程;1993年04期
10 湯健康;關(guān)于非對(duì)稱逐次超松弛方法(USSOR)的誤差界[J];高等學(xué)校計(jì)算數(shù)學(xué)學(xué)報(bào);1987年02期
相關(guān)會(huì)議論文 前1條
1 周婷;郭文彬;張仕光;;雅可比迭代方法與AOR和GSOR迭代法的比較結(jié)果[A];數(shù)學(xué)·力學(xué)·物理學(xué)·高新技術(shù)交叉研究進(jìn)展——2010(13)卷[C];2010年
,本文編號(hào):2050211
本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/2050211.html