天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

典型圖像處理算法在Xeon Phi平臺(tái)上的實(shí)現(xiàn)與優(yōu)化技術(shù)研究

發(fā)布時(shí)間:2018-12-10 11:05
【摘要】:隨著異構(gòu)平臺(tái)的興起,高性能計(jì)算領(lǐng)域獲得快速的發(fā)展;贑PU+GPU的異構(gòu)平臺(tái)在以生物信息學(xué)、醫(yī)學(xué)成像和計(jì)算流體力學(xué)等為代表的諸多領(lǐng)域獲得廣泛應(yīng)用。但是,CPU和GPU使用不同指令集和編程模型,對(duì)程序編程優(yōu)化有較高要求。Intel于2012年推出了基于眾核架構(gòu)的Xeon Phi協(xié)處理器,兼容傳統(tǒng)x86編程模型和特性,某種程度上降低了程序編程優(yōu)化的難度。Xeon Phi集成50個(gè)以上的x86輕量核,每個(gè)核支持4個(gè)硬件線程和512位SIMD向量處理,因而具有強(qiáng)大的并行處理能力。目前,使用Xeon Phi進(jìn)行算法優(yōu)化加速的研究尚處于起步階段。本文面向典型圖像處理算法在Xeon Phi平臺(tái)上的實(shí)現(xiàn)與加速展開(kāi)研究。圖像處理算法對(duì)計(jì)算性能需求較高,具有數(shù)據(jù)量大和較高實(shí)時(shí)性的特點(diǎn)。本文選取了兩個(gè)代表性算法作為研究實(shí)例,分別是2D IDCT算法和3D GVF場(chǎng)算法。本文主要工作包括:(1)在Xeon Phi平臺(tái)上實(shí)現(xiàn)2D IDCT及相關(guān)優(yōu)化。首先依據(jù)行列分離計(jì)算原理串行實(shí)現(xiàn)2D IDCT,以此作為后續(xù)優(yōu)化的性能基準(zhǔn),然后采用512位SIMD和OpenMP對(duì)串行2D IDCT進(jìn)行向量化和線程擴(kuò)展,最后進(jìn)行數(shù)據(jù)預(yù)取優(yōu)化。實(shí)驗(yàn)結(jié)果表明,對(duì)單精度圖像格式,相比未向量化版本,向量化處理可獲得約5.84倍的性能加速,且算法性能隨線程擴(kuò)展近似線性增加;使用數(shù)據(jù)預(yù)取優(yōu)化可在已有優(yōu)化基礎(chǔ)上再獲得約1.24的性能加速。綜合來(lái)說(shuō),優(yōu)化后的2D IDCT算法在Xeon Phi上的最好性能相比在一顆E5-2670 CPU上的最好性能有約1.53倍的加速比。(2)在Xeon Phi平臺(tái)上實(shí)現(xiàn)3D GVF場(chǎng)計(jì)算及相關(guān)3D GVF場(chǎng)優(yōu)化。除討論向量化和線程擴(kuò)展等通用優(yōu)化外,側(cè)重在模板計(jì)算優(yōu)化對(duì)計(jì)算性能的影響,提出一種有效的循環(huán)分塊優(yōu)化策略,有效提高了緩存利用率。實(shí)驗(yàn)結(jié)果表明,對(duì)雙精度圖像格式,經(jīng)線程擴(kuò)展和向量化能顯著提升3D GVF場(chǎng)運(yùn)算性能,通過(guò)提出的分塊優(yōu)化策略,在問(wèn)題規(guī)模為′′256256256和′′512512512時(shí),3D GVF在Xeon Phi上的計(jì)算性能在相比于在一顆E5-2670 CPU上的性能分別有約1.78和2.77的加速比。(3)歸納總結(jié)圖像處理算法在Xeon Phi平臺(tái)上的優(yōu)化規(guī)律,整理出有指導(dǎo)意義的優(yōu)化技術(shù),方便后續(xù)其他圖像處理算法的優(yōu)化。一般而言,對(duì)計(jì)算密集型的算法,直接采用諸如向量化和線程擴(kuò)展等通用優(yōu)化技術(shù)可獲得不錯(cuò)的性能提升;對(duì)計(jì)算訪存比較低的圖像處理算法,需要考慮提高緩存的利用效率,本文提出的循環(huán)分塊策略即是一種有效的方法。
[Abstract]:With the rise of heterogeneous platforms, the field of high performance computing has developed rapidly. Heterogeneous platforms based on CPU GPU are widely used in many fields, such as bioinformatics, medical imaging and computational fluid dynamics. However, CPU and GPU use different instruction sets and programming models, which have high requirements for programming optimization. Intel introduced a Xeon Phi coprocessor based on multi-core architecture in 2012, which is compatible with traditional x86 programming models and features. To some extent, the difficulty of programming optimization is reduced. Xeon Phi integrates more than 50 x86 lightweight kernels. Each kernel supports 4 hardware threads and 512-bit SIMD vector processing, so it has powerful parallel processing capability. At present, the research of optimization acceleration using Xeon Phi is still in its infancy. This paper focuses on the implementation and acceleration of typical image processing algorithms on Xeon Phi platform. Image processing algorithm requires high computational performance and has the characteristics of large amount of data and high real-time performance. In this paper, two representative algorithms, 2D IDCT algorithm and 3D GVF field algorithm, are selected as examples. The main work of this paper includes: (1) realize 2D IDCT and related optimization on Xeon Phi platform. Firstly, 2D IDCT, is realized serially according to the principle of column separation, and then the serial 2D IDCT is vectorized and threading extended by 512-bit SIMD and OpenMP. Finally, the data prefetching optimization is carried out. The experimental results show that the performance of vectorization can be accelerated by about 5.84 times compared with the non-vectorized version for single-precision image format, and the performance of the algorithm increases linearly with thread expansion. Using data prefetching optimization can gain about 1.24 performance acceleration on the basis of existing optimization. In general, the optimal performance of the optimized 2D IDCT algorithm on Xeon Phi is about 1.53 times faster than that on an E5-2670 CPU. (2) 3D GVF field calculation and related 3D GVF field optimization are realized on Xeon Phi platform. In addition to the general optimization such as vectorization and thread expansion, this paper focuses on the effect of template computing optimization on computing performance, and proposes an effective optimization strategy for circulatory blocking, which effectively improves the cache utilization rate. The experimental results show that the performance of 3D GVF field can be significantly improved by thread expansion and vectorization for the dual-precision image format. By the proposed block optimization strategy, the scale of the problem is' 256256256 'and' 51251252'. The computational performance of 3D GVF on Xeon Phi has a speedup ratio of about 1.78 and 2.77 respectively compared with that on an E5-2670 CPU. (3) the optimization law of image processing algorithm on Xeon Phi platform is summarized. The guiding optimization techniques are sorted out to facilitate the optimization of other image processing algorithms. In general, for computationally intensive algorithms, general optimization techniques such as vectorization and thread expansion can achieve good performance improvements. It is necessary to improve the efficiency of cache utilization for the image processing algorithm with low computational memory access. The circular blocking strategy proposed in this paper is an effective method.
【學(xué)位授予單位】:國(guó)防科學(xué)技術(shù)大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2013
【分類號(hào)】:TP38;TP391.41
,

本文編號(hào):2370464

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2370464.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶edce1***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com
国产精品人妻熟女毛片av久久| 欧美大粗爽一区二区三区| 亚洲午夜福利不卡片在线| 日韩欧美国产精品自拍| 白丝美女被插入视频在线观看| 日韩免费国产91在线| 久久大香蕉精品在线观看| av免费视屏在线观看| 亚洲国产综合久久天堂| 成人国产激情在线视频| 日韩欧美三级中文字幕| 国产一级片内射视频免费播放| 老司机精品视频在线免费看 | 91精品欧美综合在ⅹ| 好吊视频有精品永久免费| 99国产成人免费一区二区| 操白丝女孩在线观看免费高清| 一区二区在线激情视频| 小黄片大全欧美一区二区| 99在线视频精品免费播放| 在线精品首页中文字幕亚洲| 欧美日韩综合综合久久久| 午夜资源在线观看免费高清| 国产精品色热综合在线| 日本一区二区三区黄色| 中文字幕精品人妻一区| 国产一级特黄在线观看| 精品偷拍一区二区三区| 亚洲av专区在线观看| 男女一进一出午夜视频| 亚洲熟妇中文字幕五十路| 日韩蜜桃一区二区三区| 隔壁的日本人妻中文字幕版| 日韩一区二区三区嘿嘿| 日本高清视频在线播放| 亚洲欧美视频欧美视频| 日韩精品一区二区三区含羞含羞草| 久久精品a毛片看国产成人| 国产毛片不卡视频在线| 婷婷色香五月综合激激情| 丰满人妻熟妇乱又乱精品古代|