面向稀疏矩陣運(yùn)算的異構(gòu)并行算法研究
[Abstract]:Heterogeneous high-performance architectures have become more and more popular in the field of high-performance computing, which greatly promote the progress of large-scale computing in science and engineering. Sparse matrix computing is a very important operation in large-scale computing in science and engineering, but sparse matrix computing is a classic one. This is mainly because, on the one hand, the program behavior of sparse matrix operation is determined by the sparse structure of the matrix, and the sparse structure of the sparse matrix corresponding to different applications is different, so it is difficult to discover the locality of the sparse matrix, on the other hand, because of the sparse structure. Compressed storage of matrices leads to a large number of inter-address memory access and random memory access, so the system memory access bandwidth becomes the performance bottleneck of sparse matrix operation. In order to deal with these problems and challenges, this paper studies how to make full use of computing resources and memory access bandwidth in heterogeneous architectures to improve the performance of sparse matrix operation. The main work and innovations are as follows: (1) An efficient parallel optimization algorithm for sparse matrix vector multiplication algorithm based on heterogeneous architecture is proposed. A load-based sparse matrix vector multiplication algorithm with multiple parallel modes is proposed. The sparse matrix is partitioned based on the computing load to reduce the storage overhead caused by the change of the number of non-zero elements; different thread mapping methods are selected according to the computing load of each row to achieve load balance between threads; a load allocation strategy based on warp is designed to reduce the synchronization overhead between threads. Matrix vector multiplication algorithm can effectively improve the performance of the algorithm. (2) An efficient parallel optimization algorithm for sparse matrix transpose vector multiplication is proposed for heterogeneous architecture. A sparse matrix transpose vector multiplication algorithm without matrix transposition is proposed. The row compression scheme is used to reduce the preprocessing overhead, and the outer product calculation method is used to avoid the operation of sparse matrix transposition. The multi-channel parallel merge sort is used to eliminate the atom operation. The experimental results show that the proposed sparse matrix transposition vector multiplication algorithm can achieve effective performance improvement. (3) A heterogeneous architecture oriented sparse architecture is proposed. An efficient parallel optimization algorithm for sparse matrix multiplication is proposed in this paper.A heterogeneous cooperative sparse matrix multiplication algorithm is proposed.The upper bound allocation and the successive increase of the mixed space allocation strategy are used to improve the utilization of the storage space resources.The intermediate results are sorted by multi-parallel mergers and parallel forward scanning operations. The results show that the sparse matrix multiplication proposed in this paper can improve the utilization ratio of computing resources. (4) An efficient parallel algorithm for solving sparse trigonometric equations based on heterogeneous architecture platform is proposed. A parallel algorithm for solving sparse trigonometric equations based on hierarchical parallelism is proposed. A hybrid parallel mode of waterline parallel mode is proposed. The task layer with higher parallelism is processed by group parallel mode and the task layer with lower parallelism is processed by pipeline parallel mode. The parallel algorithm for solving sparse trigonometric equations is constructed. The experimental results show that the heterogeneous parallel algorithm proposed in this paper can effectively mine the computational performance of CPU and GPU, and improve the computational efficiency in heterogeneous architecture.
【學(xué)位授予單位】:國防科學(xué)技術(shù)大學(xué)
【學(xué)位級別】:博士
【學(xué)位授予年份】:2015
【分類號】:O241.6
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 張興令;關(guān)于高階稀疏矩陣求解的一點注記[J];甘肅工業(yè)大學(xué)學(xué)報;1988年02期
2 張奠成 ,姚棟義;電子電路機(jī)助分析和設(shè)計中的稀疏矩陣技術(shù)[J];合肥工業(yè)大學(xué)學(xué)報;1981年02期
3 匡云太;一個縮減非對稱稀疏矩陣的帶寬和外形的算法[J];同濟(jì)大學(xué)學(xué)報;1987年03期
4 于繼業(yè);稀疏矩陣塊對角化的一種方法[J];數(shù)學(xué)的實踐與認(rèn)識;1988年03期
5 黃東泉;有向圖在結(jié)構(gòu)不對稱稀疏矩陣重排序中的應(yīng)用[J];西安交通大學(xué)學(xué)報;1982年06期
6 陸黎明;陳海強(qiáng);朱鴻鶚;;稀疏矩陣技術(shù)在網(wǎng)絡(luò)分析中的應(yīng)用[J];上海師范學(xué)院學(xué)報(自然科學(xué)版);1984年03期
7 鄭志鎮(zhèn),李尚健,李志剛;稀疏矩陣帶寬減小的一種算法[J];華中理工大學(xué)學(xué)報;1998年12期
8 秦體恒;李學(xué)相;安學(xué)慶;;稀疏矩陣存儲算法的探討[J];河南機(jī)電高等�?茖W(xué)校學(xué)報;2008年01期
9 周永法;稀疏矩陣的并行算法[J];北京航空學(xué)院學(xué)報;1982年04期
10 鄭金華;稀疏矩陣的存儲結(jié)構(gòu)和乘法運(yùn)算[J];湘潭大學(xué)自然科學(xué)學(xué)報;1994年02期
相關(guān)會議論文 前3條
1 宋琦;陳璞;;稀疏求解—結(jié)構(gòu)修改的一種新的可能性[A];北京力學(xué)會第20屆學(xué)術(shù)年會論文集[C];2014年
2 徐道遠(yuǎn);王寶庭;王向東;馮伯林;;求解大型稀疏矩陣的ICCG法[A];第八屆全國結(jié)構(gòu)工程學(xué)術(shù)會議論文集(第Ⅰ卷)[C];1999年
3 苑維然;陳璞;劉凱欣;;非對稱線性方程組的快速外存解法[A];中國力學(xué)學(xué)會學(xué)術(shù)大會'2005論文摘要集(下)[C];2005年
相關(guān)博士學(xué)位論文 前1條
1 郭松;面向稀疏矩陣運(yùn)算的異構(gòu)并行算法研究[D];國防科學(xué)技術(shù)大學(xué);2015年
相關(guān)碩士學(xué)位論文 前10條
1 劉健;基于稀疏矩陣分解的特征基因識別方法研究[D];曲阜師范大學(xué);2015年
2 莊立;稀疏矩陣向量乘及自動調(diào)優(yōu)[D];杭州電子科技大學(xué);2011年
3 王冬;面向差異特征識別的稀疏矩陣分解方法的研究[D];曲阜師范大學(xué);2016年
4 馮廣祥;大型稀疏矩陣直接求解算法的研究及實現(xiàn)[D];東北大學(xué);2010年
5 丁玲;低秩與稀疏矩陣恢復(fù)問題的若干研究[D];浙江大學(xué);2012年
6 吳超凡;基于UB樹的大型稀疏矩陣存儲研究[D];云南大學(xué);2013年
7 王亞南;基于FPGA的稀疏矩陣分解實現(xiàn)[D];西安電子科技大學(xué);2009年
8 趙加強(qiáng);基于OpenCL的稀疏矩陣向量乘優(yōu)化[D];吉林大學(xué);2012年
9 施浩;基于FPGA的稀疏矩陣向量乘的優(yōu)化研究與實現(xiàn)[D];南京郵電大學(xué);2011年
10 胡耀國;基于GPU的有限元方法研究[D];華中科技大學(xué);2011年
,本文編號:2179981
本文鏈接:http://sikaile.net/kejilunwen/yysx/2179981.html