使用Stencil評估Intel AVX2 Vgather指令
發(fā)布時間:2018-01-28 02:56
本文關(guān)鍵詞: AVX vgather指令 Stencil 性能評估 出處:《計算機科學(xué)》2017年01期 論文類型:期刊論文
【摘要】:為了更好地在向量化時讀取離散的數(shù)據(jù),Intel在Haswell CPU提供了AVX2vgather指令。由于Stencil在設(shè)置邊界條件時使用了條件判斷,因此編譯器生成了vgather指令,并降低了Stencil在Haswell上的性能。提出使用peel優(yōu)化或intrinsic load的方法來避免vgather指令的生成,并把該方法應(yīng)用到3個Stencil基準(zhǔn)算例、長程Stencil程序3DFD以及混合Stencil應(yīng)用3DEW上。這些Stencil在Haswell上的性能都獲得了1.22X至3.88X不等的提升。通過研究指令的實現(xiàn),發(fā)現(xiàn)vgather指令會被解碼成多個微操作(μops),并為每個要讀入的元素生成一個μops。由于vgather指令解碼時會產(chǎn)生較高的開銷,導(dǎo)致vgather指令成為Stencil在Haswell上的性能瓶頸。了解AVX2 vgather指令的實現(xiàn)以及掌握避免生成vgather指令的優(yōu)化方法,對在Haswell上調(diào)優(yōu)具有良好空間局部性應(yīng)用的性能有一定的參考價值。
[Abstract]:To better read discrete data at vectorization. Intel provides the AVX2vgather instruction in Haswell CPU because Stencil uses conditional judgment when setting boundary conditions. So the compiler generates the vgather directive. It also reduces the performance of Stencil on Haswell. A method of peel optimization or intrinsic load is proposed to avoid the generation of vgather instructions. The method is applied to three Stencil benchmark examples. The long range Stencil program 3DFD and the hybrid Stencil application 3DEW. The performance of these Stencil on Haswell achieved 1.22X to 3.88X. Unequal ascension. Through the implementation of research instructions. It is found that the vgather instruction will be decoded into a plurality of microoperations (渭 OPS), and a 渭 op s will be generated for each element to be read. Because of the high cost of decoding the vgather instruction. Causes the vgather instruction to become a performance bottleneck for Stencil on Haswell. Learn about AVX2. The realization of vgather instruction and the optimization method to avoid generating vgather instruction. It has certain reference value for the performance of good spatial local application in the Haswell upregulation.
【作者單位】: 上海交通大學(xué)高性能計算中心;東京工業(yè)大學(xué)學(xué)術(shù)國際情報中心;Intel公司軟件與服務(wù)部門;
【基金】:國家重點研發(fā)計劃(2014AA01A302,2016YFB0201800) 日本學(xué)術(shù)振興會RONPAKU Fellowship資助
【分類號】:TP332;TP314
【正文快照】: 1簡介 為了更好地在向量化時讀取離散的數(shù)據(jù),Intel陸續(xù)在不同平臺上提供了硬件支持的vgather指令:2013年上半年發(fā)布的Knight Corner(縮寫為KNC)上的IMCI(Initial Many Core Instructions)vgather指令;2013年6月發(fā)布的Haswell(縮寫為HSW)CPU上的AVX(Advanced Vector Extension
【相似文獻】
相關(guān)期刊論文 前10條
1 洪龍;陳燕俐;朱梧i,
本文編號:1469570
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1469570.html
最近更新
教材專著