當(dāng)前位置：主頁(yè) > 科技論文 > 計(jì)算機(jī)論文 >

基于Intel Xeon Phi的稀疏矩陣向量乘性能優(yōu)化

發(fā)布時(shí)間：2019-04-21 19:35

【摘要】：稀疏矩陣向量乘(Sp MV)是線性求解系統(tǒng)等科學(xué)計(jì)算中重要的計(jì)算核心.鑒于傳統(tǒng)的稀疏矩陣向量乘算法在Intel Xeon Phi眾核集成架構(gòu)上存在SIM D利用率低,不規(guī)則訪存開(kāi)銷高及負(fù)載不均衡的問(wèn)題,難以發(fā)揮其運(yùn)算能力.本文針對(duì)Intel Xeon Phi的體系結(jié)構(gòu)特點(diǎn),提出了一種通用的分塊壓縮存儲(chǔ)表示的稀疏矩陣向量乘并行算法:(1)在ELLPACK存儲(chǔ)格式基礎(chǔ)上按列分塊及壓縮矩陣,增加非零元的密度,提高SIMD利用率;(2)通過(guò)精心的數(shù)據(jù)重排,保留矩陣非零元本身的局部性,從而提高數(shù)據(jù)重用率,降低訪存開(kāi)銷;(3)將矩陣壓縮后劃分成近似等大的矩陣塊并靜態(tài)等量分配到不同核上計(jì)算,使各核負(fù)載均衡.實(shí)驗(yàn)結(jié)果表明,與Intel Xeon Phi上已有的MKL數(shù)學(xué)庫(kù)中的CSR算法相比,本算法獲得了更高的計(jì)算訪存比,性能比M KL的CSR算法平均快2.05倍.
[Abstract]:Sparse matrix vector multiplication (Sp MV) is an important core of scientific computation such as linear solution system. Because the traditional sparse matrix vector multiplication algorithm has the problems of low utilization of SIM D, high overhead of irregular memory access and unbalanced load in the Intel Xeon Phi multikernel integration architecture, it is difficult to give full play to its computing power. According to the characteristics of Intel Xeon Phi architecture, this paper proposes a general sparse matrix vector multiplication algorithm based on block compression storage: (1) based on the ELLPACK storage format, the sparse matrix vector multiplication algorithm is proposed to increase the density of non-zero elements by column block and compression matrix. Improve the utilization rate of SIMD; (2) by meticulous data rearrangement, the locality of non-zero elements of the matrix is retained, so as to improve the data reuse rate and reduce the memory access overhead; (3) the compressed matrix is divided into approximately equal-size matrix blocks and distributed to different cores in static and equal quantities, so that the load of each core can be balanced. The experimental results show that compared with the CSR algorithm in the MKL mathematical library on Intel Xeon Phi, the proposed algorithm achieves a higher memory-to-computation ratio, and its performance is 2.05 times faster than that of MKL's CSR algorithm on average.
【作者單位】：中國(guó)科學(xué)技術(shù)大學(xué)計(jì)算機(jī)科學(xué)與技術(shù)學(xué)院;
【基金】：國(guó)家"八六三"高技術(shù)研究發(fā)展計(jì)劃項(xiàng)目(2012AA010901,2012AA010902)資助
【分類號(hào)】：TP332;O241.6

【相似文獻(xiàn)】

相關(guān)期刊論文前10條

1 張奠成 ,姚棟義;電子電路機(jī)助分析和設(shè)計(jì)中的稀疏矩陣技術(shù)[J];合肥工業(yè)大學(xué)學(xué)報(bào);1981年02期

2 匡云太;一個(gè)縮減非對(duì)稱稀疏矩陣的帶寬和外形的算法[J];同濟(jì)大學(xué)學(xué)報(bào);1987年03期

3 于繼業(yè);稀疏矩陣塊對(duì)角化的一種方法[J];數(shù)學(xué)的實(shí)踐與認(rèn)識(shí);1988年03期

4 黃東泉;有向圖在結(jié)構(gòu)不對(duì)稱稀疏矩陣重排序中的應(yīng)用[J];西安交通大學(xué)學(xué)報(bào);1982年06期

5 陸黎明;陳海強(qiáng);朱鴻鶚;;稀疏矩陣技術(shù)在網(wǎng)絡(luò)分析中的應(yīng)用[J];上海師范學(xué)院學(xué)報(bào)(自然科學(xué)版);1984年03期

6 鄭志鎮(zhèn),李尚健,李志剛;稀疏矩陣帶寬減小的一種算法[J];華中理工大學(xué)學(xué)報(bào);1998年12期

7 秦體恒;李學(xué)相;安學(xué)慶;;稀疏矩陣存儲(chǔ)算法的探討[J];河南機(jī)電高等�？茖W(xué)校學(xué)報(bào);2008年01期

8 周永法;稀疏矩陣的并行算法[J];北京航空學(xué)院學(xué)報(bào);1982年04期

9 王玉卿;高斯消元的順序和稀疏矩陣的圖解[J];沈陽(yáng)工業(yè)大學(xué)學(xué)報(bào);1993年03期

10 應(yīng)宏;;稀疏矩陣鏈?zhǔn)酱鎯?chǔ)的一種實(shí)現(xiàn)[J];牡丹江師范學(xué)院學(xué)報(bào)(自然科學(xué)版);1997年01期

相關(guān)碩士學(xué)位論文前5條

1 胡耀國(guó);基于GPU的有限元方法研究[D];華中科技大學(xué);2011年

2 梁添;基于GPU的稀疏矩陣運(yùn)算優(yōu)化研究[D];華中科技大學(xué);2012年

3 吳長(zhǎng)江;基于CUDA的大規(guī)模線性稀疏方程組求解器的設(shè)計(jì)[D];電子科技大學(xué);2013年

4 劉恩益;基于GPU的不可壓縮管流并行數(shù)值模擬關(guān)鍵技術(shù)研究[D];杭州電子科技大學(xué);2014年

5 張?zhí)m;稀疏矩陣方程組預(yù)處理迭代技術(shù)研究[D];華南理工大學(xué);2010年

，

本文編號(hào)：2462494

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2462494.html

上一篇：嵌入式系統(tǒng)任務(wù)級(jí)調(diào)試器的研究與實(shí)現(xiàn)
下一篇：應(yīng)用ANSYS熱分析軟件優(yōu)化IDC機(jī)房散熱設(shè)計(jì)

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于Intel Xeon Phi的稀疏矩陣向量乘性能優(yōu)化