使用Stencil評(píng)估Intel AVX2 Vgather指令
發(fā)布時(shí)間:2018-01-28 02:56
本文關(guān)鍵詞: AVX vgather指令 Stencil 性能評(píng)估 出處:《計(jì)算機(jī)科學(xué)》2017年01期 論文類型:期刊論文
【摘要】:為了更好地在向量化時(shí)讀取離散的數(shù)據(jù),Intel在Haswell CPU提供了AVX2vgather指令。由于Stencil在設(shè)置邊界條件時(shí)使用了條件判斷,因此編譯器生成了vgather指令,并降低了Stencil在Haswell上的性能。提出使用peel優(yōu)化或intrinsic load的方法來(lái)避免vgather指令的生成,并把該方法應(yīng)用到3個(gè)Stencil基準(zhǔn)算例、長(zhǎng)程Stencil程序3DFD以及混合Stencil應(yīng)用3DEW上。這些Stencil在Haswell上的性能都獲得了1.22X至3.88X不等的提升。通過(guò)研究指令的實(shí)現(xiàn),發(fā)現(xiàn)vgather指令會(huì)被解碼成多個(gè)微操作(μops),并為每個(gè)要讀入的元素生成一個(gè)μops。由于vgather指令解碼時(shí)會(huì)產(chǎn)生較高的開(kāi)銷,導(dǎo)致vgather指令成為Stencil在Haswell上的性能瓶頸。了解AVX2 vgather指令的實(shí)現(xiàn)以及掌握避免生成vgather指令的優(yōu)化方法,對(duì)在Haswell上調(diào)優(yōu)具有良好空間局部性應(yīng)用的性能有一定的參考價(jià)值。
[Abstract]:To better read discrete data at vectorization. Intel provides the AVX2vgather instruction in Haswell CPU because Stencil uses conditional judgment when setting boundary conditions. So the compiler generates the vgather directive. It also reduces the performance of Stencil on Haswell. A method of peel optimization or intrinsic load is proposed to avoid the generation of vgather instructions. The method is applied to three Stencil benchmark examples. The long range Stencil program 3DFD and the hybrid Stencil application 3DEW. The performance of these Stencil on Haswell achieved 1.22X to 3.88X. Unequal ascension. Through the implementation of research instructions. It is found that the vgather instruction will be decoded into a plurality of microoperations (渭 OPS), and a 渭 op s will be generated for each element to be read. Because of the high cost of decoding the vgather instruction. Causes the vgather instruction to become a performance bottleneck for Stencil on Haswell. Learn about AVX2. The realization of vgather instruction and the optimization method to avoid generating vgather instruction. It has certain reference value for the performance of good spatial local application in the Haswell upregulation.
【作者單位】: 上海交通大學(xué)高性能計(jì)算中心;東京工業(yè)大學(xué)學(xué)術(shù)國(guó)際情報(bào)中心;Intel公司軟件與服務(wù)部門;
【基金】:國(guó)家重點(diǎn)研發(fā)計(jì)劃(2014AA01A302,2016YFB0201800) 日本學(xué)術(shù)振興會(huì)RONPAKU Fellowship資助
【分類號(hào)】:TP332;TP314
【正文快照】: 1簡(jiǎn)介 為了更好地在向量化時(shí)讀取離散的數(shù)據(jù),Intel陸續(xù)在不同平臺(tái)上提供了硬件支持的vgather指令:2013年上半年發(fā)布的Knight Corner(縮寫為KNC)上的IMCI(Initial Many Core Instructions)vgather指令;2013年6月發(fā)布的Haswell(縮寫為HSW)CPU上的AVX(Advanced Vector Extension
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 洪龍;陳燕俐;朱梧i,
本文編號(hào):1469570
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/1469570.html
最近更新
教材專著