面向稀疏矩陣運算的異構(gòu)并行算法研究

發(fā)布時間：2018-08-12 18:52

【摘要】：異構(gòu)高性能體系結(jié)構(gòu)已經(jīng)成為高性能計算領(lǐng)域越來越流行的一種架構(gòu),極大的推動著當(dāng)今社會大規(guī)模科學(xué)與工程領(lǐng)域計算的進(jìn)步。稀疏矩陣運算是大規(guī)�？茖W(xué)與工程領(lǐng)域計算的非常重要的一種操作,但是由于稀疏矩陣運算是一種典型的不規(guī)則運算,嚴(yán)重限制著異構(gòu)高性能體系結(jié)構(gòu)計算性能的發(fā)揮。這主要是由于,一方面,稀疏矩陣運算的程序行為是由矩陣的稀疏結(jié)構(gòu)確定的,不同應(yīng)用所對應(yīng)的稀疏矩陣的稀疏結(jié)構(gòu)不同,難以發(fā)掘其中的局部性,另一方面,由于稀疏矩陣的壓縮存儲導(dǎo)致大量的間址訪存和隨機訪存,從而系統(tǒng)的訪存帶寬成為稀疏矩陣運算的性能瓶頸。為了應(yīng)對這些問題和挑戰(zhàn),本文以稀疏矩陣運算為背景,研究如何充分利用異構(gòu)體系結(jié)構(gòu)中的計算資源和訪存帶寬提高稀疏矩陣運算的性能,主要工作和創(chuàng)新點如下:(1)提出了面向異構(gòu)體系結(jié)構(gòu)的稀疏矩陣向量乘算法的高效并行優(yōu)化算法本文提出了一種基于負(fù)載的多種并行模式混合的稀疏矩陣向量乘算法。針對SpMV操作中不規(guī)則訪存多和負(fù)載不均衡等問題,本文基于計算負(fù)載對稀疏矩陣進(jìn)行分塊,減少由非零元個數(shù)變化所引入的存儲開銷;根據(jù)每行的計算負(fù)載選擇不同的線程映射方式,實現(xiàn)線程間負(fù)載均衡;設(shè)計一種基于warp的負(fù)載分配策略,減少線程間同步開銷。實驗結(jié)果表明,本文提出稀疏矩陣向量乘算法能夠有效提高算法的性能。(2)提出了面向異構(gòu)體系結(jié)構(gòu)的稀疏矩陣轉(zhuǎn)置向量乘的高效并行優(yōu)化算法本文提出了無矩陣轉(zhuǎn)置的稀疏矩陣轉(zhuǎn)置向量乘算法。針對稀疏矩陣轉(zhuǎn)置向量乘算法的轉(zhuǎn)置或原子開銷大等問題,本文采用行壓縮格式減少預(yù)處理開銷,采用外積的計算方式避免稀疏矩陣轉(zhuǎn)置的操作;采用多路并行歸并排序消除原子操作。實驗結(jié)果表明,本文所提出的稀疏矩陣轉(zhuǎn)置向量乘算法能夠獲得有效的性能提升。(3)提出了面向異構(gòu)體系結(jié)構(gòu)的稀疏矩陣稀疏矩陣乘的高效并行優(yōu)化算法本文提出了一種異構(gòu)協(xié)同的稀疏矩陣稀疏矩陣乘算法。采用上限分配和逐次增加的混合空間分配策略,提高存儲空間資源利用率;通過對中間結(jié)果進(jìn)行多路并行歸并排序和并行前向掃描操作,高效地實現(xiàn)中間結(jié)果的累加操作;根據(jù)結(jié)果矩陣每列非零元素個數(shù)進(jìn)行任務(wù)劃分,實現(xiàn)線程間的負(fù)載均衡;根據(jù)CPU-GPU的計算能力進(jìn)行任務(wù)劃分,實現(xiàn)異構(gòu)協(xié)同并行,提高計算資源的利用率。實驗結(jié)果表明,本文所提出的稀疏矩陣稀疏矩陣乘算法具有較好的性能提升。(4)提出了面向異構(gòu)體系結(jié)構(gòu)平臺的稀疏三角方程組求解的高效并行優(yōu)化算法本文提出了一種基于層次化并行的稀疏三角方程組并行求解算法。針對不同層次并行度不同的特征,設(shè)計了基于群并行模式和流水線并行模式兩種方式的混合并行模式。利用群并行模式處理并行度較高的任務(wù)層,利用流水線并行模式處理并行度較低的任務(wù)層。在此基礎(chǔ)上,基于任務(wù)層的計算任務(wù)實現(xiàn)計算過程中的線程間負(fù)載均衡策略。最后設(shè)計了一種CPU-GPU異構(gòu)協(xié)同的稀疏三角方程組求解的并行算法,實驗結(jié)果表明,本文提出的異構(gòu)并行算法能夠有效的挖掘CPU和GPU的計算性能,提高異構(gòu)體系結(jié)構(gòu)中的計算效率。
[Abstract]:Heterogeneous high-performance architectures have become more and more popular in the field of high-performance computing, which greatly promote the progress of large-scale computing in science and engineering. Sparse matrix computing is a very important operation in large-scale computing in science and engineering, but sparse matrix computing is a classic one. This is mainly because, on the one hand, the program behavior of sparse matrix operation is determined by the sparse structure of the matrix, and the sparse structure of the sparse matrix corresponding to different applications is different, so it is difficult to discover the locality of the sparse matrix, on the other hand, because of the sparse structure. Compressed storage of matrices leads to a large number of inter-address memory access and random memory access, so the system memory access bandwidth becomes the performance bottleneck of sparse matrix operation. In order to deal with these problems and challenges, this paper studies how to make full use of computing resources and memory access bandwidth in heterogeneous architectures to improve the performance of sparse matrix operation. The main work and innovations are as follows: (1) An efficient parallel optimization algorithm for sparse matrix vector multiplication algorithm based on heterogeneous architecture is proposed. A load-based sparse matrix vector multiplication algorithm with multiple parallel modes is proposed. The sparse matrix is partitioned based on the computing load to reduce the storage overhead caused by the change of the number of non-zero elements; different thread mapping methods are selected according to the computing load of each row to achieve load balance between threads; a load allocation strategy based on warp is designed to reduce the synchronization overhead between threads. Matrix vector multiplication algorithm can effectively improve the performance of the algorithm. (2) An efficient parallel optimization algorithm for sparse matrix transpose vector multiplication is proposed for heterogeneous architecture. A sparse matrix transpose vector multiplication algorithm without matrix transposition is proposed. The row compression scheme is used to reduce the preprocessing overhead, and the outer product calculation method is used to avoid the operation of sparse matrix transposition. The multi-channel parallel merge sort is used to eliminate the atom operation. The experimental results show that the proposed sparse matrix transposition vector multiplication algorithm can achieve effective performance improvement. (3) A heterogeneous architecture oriented sparse architecture is proposed. An efficient parallel optimization algorithm for sparse matrix multiplication is proposed in this paper.A heterogeneous cooperative sparse matrix multiplication algorithm is proposed.The upper bound allocation and the successive increase of the mixed space allocation strategy are used to improve the utilization of the storage space resources.The intermediate results are sorted by multi-parallel mergers and parallel forward scanning operations. The results show that the sparse matrix multiplication proposed in this paper can improve the utilization ratio of computing resources. (4) An efficient parallel algorithm for solving sparse trigonometric equations based on heterogeneous architecture platform is proposed. A parallel algorithm for solving sparse trigonometric equations based on hierarchical parallelism is proposed. A hybrid parallel mode of waterline parallel mode is proposed. The task layer with higher parallelism is processed by group parallel mode and the task layer with lower parallelism is processed by pipeline parallel mode. The parallel algorithm for solving sparse trigonometric equations is constructed. The experimental results show that the heterogeneous parallel algorithm proposed in this paper can effectively mine the computational performance of CPU and GPU, and improve the computational efficiency in heterogeneous architecture.
【學(xué)位授予單位】：國防科學(xué)技術(shù)大學(xué)
【學(xué)位級別】：博士
【學(xué)位授予年份】：2015
【分類號】：O241.6

【相似文獻(xiàn)】

相關(guān)期刊論文前10條

1 張興令;關(guān)于高階稀疏矩陣求解的一點注記[J];甘肅工業(yè)大學(xué)學(xué)報;1988年02期

2 張奠成 ,姚棟義;電子電路機助分析和設(shè)計中的稀疏矩陣技術(shù)[J];合肥工業(yè)大學(xué)學(xué)報;1981年02期

3 匡云太;一個縮減非對稱稀疏矩陣的帶寬和外形的算法[J];同濟大學(xué)學(xué)報;1987年03期

4 于繼業(yè);稀疏矩陣塊對角化的一種方法[J];數(shù)學(xué)的實踐與認(rèn)識;1988年03期

5 黃東泉;有向圖在結(jié)構(gòu)不對稱稀疏矩陣重排序中的應(yīng)用[J];西安交通大學(xué)學(xué)報;1982年06期

6 陸黎明;陳海強;朱鴻鶚;;稀疏矩陣技術(shù)在網(wǎng)絡(luò)分析中的應(yīng)用[J];上海師范學(xué)院學(xué)報(自然科學(xué)版);1984年03期

7 鄭志鎮(zhèn),李尚健,李志剛;稀疏矩陣帶寬減小的一種算法[J];華中理工大學(xué)學(xué)報;1998年12期

8 秦體恒;李學(xué)相;安學(xué)慶;;稀疏矩陣存儲算法的探討[J];河南機電高等�？茖W(xué)校學(xué)報;2008年01期

9 周永法;稀疏矩陣的并行算法[J];北京航空學(xué)院學(xué)報;1982年04期

10 鄭金華;稀疏矩陣的存儲結(jié)構(gòu)和乘法運算[J];湘潭大學(xué)自然科學(xué)學(xué)報;1994年02期

相關(guān)會議論文前3條

1 宋琦;陳璞;;稀疏求解—結(jié)構(gòu)修改的一種新的可能性[A];北京力學(xué)會第20屆學(xué)術(shù)年會論文集[C];2014年

2 徐道遠(yuǎn);王寶庭;王向東;馮伯林;;求解大型稀疏矩陣的ICCG法[A];第八屆全國結(jié)構(gòu)工程學(xué)術(shù)會議論文集（第Ⅰ卷）[C];1999年

3 苑維然;陳璞;劉凱欣;;非對稱線性方程組的快速外存解法[A];中國力學(xué)學(xué)會學(xué)術(shù)大會'2005論文摘要集（下）[C];2005年

相關(guān)博士學(xué)位論文前1條

1 郭松;面向稀疏矩陣運算的異構(gòu)并行算法研究[D];國防科學(xué)技術(shù)大學(xué);2015年

相關(guān)碩士學(xué)位論文前10條

1 劉健;基于稀疏矩陣分解的特征基因識別方法研究[D];曲阜師范大學(xué);2015年

2 莊立;稀疏矩陣向量乘及自動調(diào)優(yōu)[D];杭州電子科技大學(xué);2011年

3 王冬;面向差異特征識別的稀疏矩陣分解方法的研究[D];曲阜師范大學(xué);2016年

4 馮廣祥;大型稀疏矩陣直接求解算法的研究及實現(xiàn)[D];東北大學(xué);2010年

5 丁玲;低秩與稀疏矩陣恢復(fù)問題的若干研究[D];浙江大學(xué);2012年

6 吳超凡;基于UB樹的大型稀疏矩陣存儲研究[D];云南大學(xué);2013年

7 王亞南;基于FPGA的稀疏矩陣分解實現(xiàn)[D];西安電子科技大學(xué);2009年

8 趙加強;基于OpenCL的稀疏矩陣向量乘優(yōu)化[D];吉林大學(xué);2012年

9 施浩;基于FPGA的稀疏矩陣向量乘的優(yōu)化研究與實現(xiàn)[D];南京郵電大學(xué);2011年

10 胡耀國;基于GPU的有限元方法研究[D];華中科技大學(xué);2011年

，

本文編號：2179981

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/yysx/2179981.html

上一篇：非自治反映擴散方程后項緊拉回吸引子的存在性
下一篇：具有三物種的食餌-捕食反應(yīng)擴散時滯系統(tǒng)的穩(wěn)定性與行波解（英文）

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

面向稀疏矩陣運算的異構(gòu)并行算法研究