GPU上基本圖像處理算法性能優(yōu)化關(guān)鍵技術(shù)研究

發(fā)布時(shí)間：2019-02-15 04:39

【摘要】：圖像處理主要包括圖像壓縮、圖像濾波、圖像采樣、圖像分割和圖像分析等,目前在眾多實(shí)際應(yīng)有領(lǐng)域都有重要應(yīng)有,如目前應(yīng)用廣泛的圖像識(shí)別,人臉識(shí)別等,其技術(shù)核心均是圖像處理問題。隨著這些領(lǐng)域圖像處理規(guī)模的不斷擴(kuò)大以及對(duì)實(shí)時(shí)性能要求的不斷提高,如何提高圖像處理算法的性能已經(jīng)成為當(dāng)前的研究熱點(diǎn)。GPU(Graphics Processing Units)在處理能力和存儲(chǔ)帶寬上相對(duì)CPU都有無可比擬的優(yōu)勢(shì),其發(fā)展為圖像處理應(yīng)用的實(shí)時(shí)性要求提供了解決方案。鑒于OpenCL的跨平臺(tái)特性,本文將使用OpenCL在GPU上實(shí)現(xiàn)并行圖像處理的相關(guān)操作,可大幅提升圖像處理算法性能,對(duì)于前面提到的相關(guān)問題無疑是一個(gè)很好的解決方法。同時(shí),圖像處理算法通常具有數(shù)據(jù)量大以及計(jì)算訪存密集的特點(diǎn),在GPU上并行化處理是切實(shí)可行的解決方案。因此,本文將針對(duì)圖像處理算法在GPU上的實(shí)現(xiàn)以及優(yōu)化方法進(jìn)行研究。由于GPU架構(gòu)的復(fù)雜性和硬件資源的限制,性能優(yōu)化成為GPU編程的難點(diǎn)和重點(diǎn),GPU優(yōu)化的本質(zhì)是實(shí)現(xiàn)算法特征向底層硬件架構(gòu)特征的高效映射。本文將結(jié)合圖像處理算法特性和底層硬件架構(gòu)特征開展圖像處理算法在GPU計(jì)算平臺(tái)上的性能優(yōu)化研究。本文針對(duì)常用的圖像處理算法類型進(jìn)行研究,包括上采樣、下采樣、歸約、水平濾波、垂直濾波、卷積,過沖控制等。由于這些算法分別具有不同的計(jì)算訪存特征,因此,本文將結(jié)合GPU硬件平臺(tái)特性,從數(shù)據(jù)傳輸優(yōu)化、訪存優(yōu)化、NDRange優(yōu)化、指令流優(yōu)化、數(shù)據(jù)共享優(yōu)化和數(shù)據(jù)相關(guān)優(yōu)化等角度總結(jié)具有不同特征的圖像處理算法在GPU上的性能瓶頸和優(yōu)化方法。本文的主要工作如下:1)Sharpness綜合圖像處理算法在GPU計(jì)算平臺(tái)上的實(shí)現(xiàn)和優(yōu)化。對(duì)Sharpness中所包含的基本圖像處理算法進(jìn)行了算法分析和并行性分析。對(duì)GPU上的Sharpness算法采取的優(yōu)化方式包括:數(shù)據(jù)傳輸優(yōu)化,kernel融合,歸約優(yōu)化,向量化及數(shù)據(jù)本地化優(yōu)化,邊界優(yōu)化和其他基本優(yōu)化方法。同時(shí)研究了Sharpness算法在SIMD上的優(yōu)化。對(duì)比分析了Sharpness算法CPU版本的性能、SIMD優(yōu)化后版本的性能以及GPU優(yōu)化后版本的性能。2)同時(shí)也分析了Laplacian綜合圖像處理算法,對(duì)Laplacian中所包含的基本圖像處理算法進(jìn)行了算法分析和并行性分析。對(duì)GPU上的Laplacian算法采用的優(yōu)化方法包括:kernel融合、減少全局同步并精簡(jiǎn)算法,添加padding、減少條件判斷并解決數(shù)據(jù)對(duì)齊問題。同時(shí)介紹了Laplacian算法在SIMD上的優(yōu)化。對(duì)比分析了Laplacian算法CPU版本的性能、SIMD優(yōu)化后版本的性能以及GPU優(yōu)化后版本的性能。實(shí)驗(yàn)結(jié)果可以看出,GPU加速圖像處理算法具有無可比擬的優(yōu)勢(shì),同時(shí)由于GPU硬件架構(gòu)的特性,為移植到GPU上的算法進(jìn)行針對(duì)硬件架構(gòu)的優(yōu)化對(duì)于性能有較大影響。
[Abstract]:Image processing mainly includes image compression, image filtering, image sampling, image segmentation and image analysis. The core of its technology is image processing. With the increasing scale of image processing in these fields and the increasing demand for real-time performance, How to improve the performance of image processing algorithm has become a research hotspot in the current research. GPU (Graphics Processing Units) has an unparalleled advantage over CPU in terms of processing capacity and storage bandwidth. It provides a solution for the real-time application of image processing. In view of the cross-platform characteristics of OpenCL, this paper will use OpenCL to implement parallel image processing operations on GPU, which can greatly improve the performance of image processing algorithm. It is undoubtedly a good solution to the related problems mentioned above. At the same time, image processing algorithms usually have the characteristics of large amount of data and dense computing access. Parallel processing is a feasible solution on GPU. Therefore, this paper will focus on the implementation of image processing algorithm on GPU and the optimization method. Because of the complexity of GPU architecture and the limitation of hardware resources, performance optimization has become the difficulty and focus of GPU programming. The essence of GPU optimization is to realize the efficient mapping between algorithm features and underlying hardware architecture features. In this paper, the performance optimization of image processing algorithm on GPU computing platform is studied based on the characteristics of image processing algorithm and underlying hardware architecture. In this paper, the types of image processing algorithms are studied, including up-sampling, down-sampling, reduction, horizontal filtering, vertical filtering, convolution, overshoot control and so on. Because these algorithms have different computational memory access characteristics, this paper will combine the characteristics of GPU hardware platform, from data transmission optimization, memory access optimization, NDRange optimization, instruction flow optimization, The performance bottlenecks and optimization methods of image processing algorithms with different characteristics on GPU are summarized from the point of view of data sharing optimization and data correlation optimization. The main work of this paper is as follows: 1) the realization and optimization of Sharpness synthetic image processing algorithm on GPU computing platform. The algorithm analysis and parallelism analysis of the basic image processing algorithm included in Sharpness are carried out. The optimization methods of Sharpness algorithm on GPU include: data transmission optimization, kernel fusion, reduction optimization, vectorization and data localization optimization, boundary optimization and other basic optimization methods. At the same time, the optimization of Sharpness algorithm on SIMD is studied. The performance of the CPU version of the Sharpness algorithm, the performance of the optimized version of SIMD and the performance of the optimized version of GPU are compared and analyzed. 2) at the same time, the comprehensive image processing algorithm of Laplacian is also analyzed. The algorithm analysis and parallelism analysis of the basic image processing algorithm included in Laplacian are carried out. The optimization methods for Laplacian algorithm on GPU include: kernel fusion, reducing global synchronization and reducing algorithm, adding padding, to reduce conditional judgment and solving the problem of data alignment. At the same time, the optimization of Laplacian algorithm on SIMD is introduced. The performance of CPU version of Laplacian algorithm, the performance of optimized version of SIMD and the performance of optimized version of GPU are compared and analyzed. The experimental results show that the GPU accelerated image processing algorithm has unparalleled advantages. At the same time because of the characteristics of the GPU hardware architecture the optimization of the hardware architecture for the algorithm transplanted to GPU has a great impact on the performance.
【學(xué)位授予單位】：中國(guó)科學(xué)院大學(xué)(中國(guó)科學(xué)院工程管理與信息技術(shù)學(xué)院)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2017
【分類號(hào)】：TP391.41

【相似文獻(xiàn)】

相關(guān)期刊論文前10條

1 張桂林;張留洋;;數(shù)字圖像處理算法評(píng)估系統(tǒng)的硬件設(shè)計(jì)[J];計(jì)算機(jī)與數(shù)字工程;2005年12期

2 張永良;李忠海;;圖像處理算法的效果評(píng)價(jià)標(biāo)準(zhǔn)分析[J];武漢理工大學(xué)學(xué)報(bào)(交通科學(xué)與工程版);2006年02期

3 侯相深,王哲人,楊澤眾;路面損壞的圖像處理算法淺析[J];公路;2003年03期

4 熊杰;劉彩云;;基于消息傳遞接口的并行圖像處理算法研究[J];成都大學(xué)學(xué)報(bào)(自然科學(xué)版);2010年02期

5 羅傳萍;;關(guān)于使用多值邏輯運(yùn)算的計(jì)算機(jī)圖像處理算法研究[J];交通與計(jì)算機(jī);1991年05期

6 吳曉;曹其新;;白點(diǎn)定位圖像處理算法[J];中國(guó)礦業(yè)大學(xué)學(xué)報(bào);2008年06期

7 馬潔;;一種基于線性代數(shù)的圖像處理算法研究[J];計(jì)算機(jī)科學(xué);2012年11期

8 伯紹波;閆茂德;孫國(guó)軍;賀昱曜;;瀝青路面裂縫檢測(cè)圖像處理算法研究[J];微計(jì)算機(jī)信息;2007年15期

9 許文偉;徐德民;;用于無人機(jī)著陸的圖像處理算法[J];火力與指揮控制;2008年08期

10 劉紹軍,于新瑞,梁慶華,王石剛;視覺多功能貼片機(jī)中的圖像處理算法研究[J];計(jì)算機(jī)工程與應(yīng)用;2002年23期

相關(guān)會(huì)議論文前5條

1 李蓮;馬彥鋒;周潔;;破損膠囊圖像處理算法的比較研究[A];中國(guó)儀器儀表學(xué)會(huì)第十二屆青年學(xué)術(shù)會(huì)議論文集[C];2010年

2 許信松;王魯平;;基于雙邊濾波的紅外圖像細(xì)節(jié)增強(qiáng)算法研究[A];第十屆全國(guó)光電技術(shù)學(xué)術(shù)交流會(huì)論文集[C];2012年

3 郝仁劍;張婷;李佳洪;羅馬思陽(yáng);;基于DM643自動(dòng)追蹤系統(tǒng)的設(shè)計(jì)及圖像處理算法研究[A];第二十九屆中國(guó)控制會(huì)議論文集[C];2010年

4 李忠科;宋大虎;;三維掃描儀亮帶圖像處理算法研究[A];第九次全國(guó)口腔醫(yī)學(xué)計(jì)算機(jī)應(yīng)用學(xué)術(shù)會(huì)議論文匯編[C];2011年

5 張海林;葛思擘;施仁;;基于線陣CCD的煙葉雜質(zhì)剔除系統(tǒng)的研究[A];中國(guó)儀器儀表學(xué)會(huì)第五屆青年學(xué)術(shù)會(huì)議論文集[C];2003年

相關(guān)重要報(bào)紙文章前2條

1 華北光電技術(shù)研究所劉剛;FPGA+DSP升級(jí)熱像設(shè)計(jì)[N];中國(guó)電子報(bào);2010年

2 劉暉;貼近專業(yè)的感覺[N];計(jì)算機(jī)世界;2002年

相關(guān)博士學(xué)位論文前5條

1 王建莊;基于FPGA的高速圖像處理算法研究及系統(tǒng)實(shí)現(xiàn)[D];華中科技大學(xué);2011年

2 郭艷菊;基于仿生智能優(yōu)化的圖像處理算法研究[D];河北工業(yè)大學(xué);2014年

3 白旭;電視制導(dǎo)中圖像處理算法和信息安全問題研究[D];哈爾濱工業(yè)大學(xué);2008年

4 白俊奇;高分辨率紅外成像中的圖像處理算法研究[D];南京理工大學(xué);2010年

5 魏卓;含GPU環(huán)境高清視頻圖像處理算法研究與實(shí)現(xiàn)[D];華中科技大學(xué);2011年

相關(guān)碩士學(xué)位論文前10條

1 許卉;基于圖像處理算法的嵌入式交通信號(hào)控制系統(tǒng)的研究與設(shè)計(jì)[D];內(nèi)蒙古大學(xué);2015年

2 齊金;典型圖像處理算法在Xeon Phi平臺(tái)上的實(shí)現(xiàn)與優(yōu)化技術(shù)研究[D];國(guó)防科學(xué)技術(shù)大學(xué);2013年

3 章飄艷;生產(chǎn)線產(chǎn)品缺陷檢測(cè)中的圖像處理算法研究[D];南京航空航天大學(xué);2014年

4 羅林;基于FPGA的快速圖像處理算法的研究與實(shí)現(xiàn)[D];重慶交通大學(xué);2015年

5 王麗麗;輕武器電子校瞄系統(tǒng)研究[D];中北大學(xué);2016年

6 許雪;基于自適應(yīng)壓縮感知的圖像處理算法研究[D];北京理工大學(xué);2016年

7 賀瑞芳;視覺假體圖像處理算法的研究[D];西安工程大學(xué);2016年

8 李保梁;CAM血管新生圖像處理算法研究[D];長(zhǎng)春工業(yè)大學(xué);2016年

9 李建飛;基于.net框架的數(shù)字圖像處理算法研究[D];福州大學(xué);2013年

10 王靜媛;微掃描顯微熱成像系統(tǒng)高分辨力圖像處理算法研究[D];燕山大學(xué);2016年

，

本文編號(hào)：2422975

資料下載

論文發(fā)表

本文鏈接：http://sikaile.net/kejilunwen/ruanjiangongchenglunwen/2422975.html

上一篇：基于Lucene的電話號(hào)碼智能搜索算法研究及系統(tǒng)實(shí)現(xiàn)
下一篇：THT機(jī)制耦合高斯模糊的圖像融合方案

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

GPU上基本圖像處理算法性能優(yōu)化關(guān)鍵技術(shù)研究