基于Kakadu的JPEG2000解碼系統(tǒng)GPU并行優(yōu)化
本文選題:GPU + JPEG2000; 參考:《西安電子科技大學(xué)》2014年碩士論文
【摘要】:JPEG2000是基于小波變換的圖像壓縮標(biāo)準(zhǔn),因其良好的低比特壓縮性能、能實(shí)現(xiàn)漸進(jìn)傳輸、可對感興趣區(qū)域編碼以及良好的魯棒性等優(yōu)點(diǎn),被廣泛應(yīng)用于遙感、航天航空、醫(yī)學(xué)、軍事、氣象等各大領(lǐng)域。Kakadu是目前JPEG2000算法實(shí)現(xiàn)效率最高的系統(tǒng)之一,依靠其獨(dú)特的三層體系結(jié)構(gòu)極大程度上簡化了圖像編解碼的復(fù)雜性,并且以面向?qū)ο蠓绞綄?shí)現(xiàn),使其具有良好的可復(fù)用性。但是隨著科技的發(fā)展,尤其在航空航天和軍事領(lǐng)域,對圖像的解壓縮具有較高的速度要求,目前基于CPU的解決方案成本較高效率較低,難以滿足實(shí)際需求。CPU中緩存與控制器單元消耗了大部分的晶體管資源,而圖形處理器GPU將更多的晶體管資源用于了邏輯運(yùn)算,因此計(jì)算能力相比CPU有很大優(yōu)勢,適合大規(guī)模并行運(yùn)算。為了提高基于Kakadu的圖像解壓縮系統(tǒng)效率,滿足實(shí)際應(yīng)用需求,本文提出了一種基于Kakadu的JPEG2000解碼系統(tǒng)的GPU并行優(yōu)化方案,將Kakadu解碼系統(tǒng)的解碼核心部分使用高性能并行計(jì)算技術(shù)在GPU上實(shí)現(xiàn)。本文介紹了JPEG2000圖像壓縮標(biāo)準(zhǔn)、GPU的發(fā)展過程和CUDA編程,然后對基于Kakadu的JPEG2000解碼系統(tǒng)進(jìn)行GPU并行優(yōu)化,主要工作有:1、Tier2部分的高性能并行實(shí)現(xiàn)。Tier2模塊主要分為三部分:包頭解析,tile頭解析,碼流組織。本文采用線程級并行方案,對碼流組織部分使用不同GPU線程負(fù)責(zé)不同位置的比特搬移,線程之間相互并行的方法實(shí)現(xiàn)。2、Tier1部分的高性能并行實(shí)現(xiàn)。Tier1解碼使用碼塊級并行解碼,各碼塊之間相互獨(dú)立,使用一個GPU線程塊解碼一幅圖像,線程塊中的一個線程解碼圖像的一個碼塊,進(jìn)行碼塊之間的并行。3、小波逆變換部分的高性能并行實(shí)現(xiàn)。采用圖像內(nèi)部行與行之間并行,圖像與圖像之間串行操作。小波逆變換包括四個步驟,預(yù)縮放,垂直濾波,水平濾波器,后縮放。將每個部分通過線程級并行來加速。將GPU的線程塊的數(shù)目設(shè)定圖像行數(shù)。用一個GPU線程塊處理一行,完成行之間的并行,用一個線程處理一個像素點(diǎn),完成像素點(diǎn)之間的并行。通過將基于Kakadu的JPEG2000解碼系統(tǒng)進(jìn)行GPU并行優(yōu)化,優(yōu)化后解壓縮恢復(fù)圖像的質(zhì)量和優(yōu)化前圖像質(zhì)量相同,在解碼圖像質(zhì)量保證的情況下,解碼速度有2到4倍的提高。該系統(tǒng)的GPU并行優(yōu)化大大加快了解碼系統(tǒng)的整體運(yùn)行速度,提高了JPEG2000圖像解壓縮算法的吞吐量,滿足了大數(shù)據(jù)量圖像實(shí)時解碼的需求。
[Abstract]:JPEG2000 is an image compression standard based on wavelet transform. Because of its good performance of low bit compression, progressive transmission, coding of region of interest and good robustness, JPEG2000 is widely used in remote sensing, aerospace, medicine, etc. Kakadu is one of the most efficient JPEG2000 algorithms in military, meteorological and other fields. It greatly simplifies the complexity of image coding and decoding by its unique three-tier architecture, and is implemented in an object-oriented manner. It has good reusability. However, with the development of science and technology, especially in the fields of aerospace and military, the decompression of image has a high speed requirement, and the cost of the solution based on CPU is low. It is difficult to meet the actual requirements. The buffer and controller unit in the CPU consumes most of the transistor resources, while the graphics processor GPU uses more transistor resources for the logic operation, so the computing power has a great advantage over CPU. Suitable for large scale parallel operation. In order to improve the efficiency of the image decompression system based on Kakadu and meet the practical application requirements, this paper proposes a GPU parallel optimization scheme for the JPEG2000 decoding system based on Kakadu. The core part of Kakadu decoding system is implemented on GPU using high performance parallel computing technology. This paper introduces the development process and CUDA programming of JPEG2000 image compression standard. Then the GPU parallel optimization of JPEG2000 decoding system based on Kakadu is carried out. The main work includes the parallel implementation of high performance. Tier2 module is mainly divided into three parts: packet head analysis tile head analysis, bit stream organization. In this paper, thread-level parallelism scheme is adopted. Different GPU threads are used to carry out bit shift at different locations for the bitstream organization part. The parallel implementation of the high performance parallel implementation of .2nTier1 part using block level parallel decoding is realized by parallelism between threads. Each block is independent of each other. A GPU thread block is used to decode an image, a thread in the thread block decodes a block of the image, and the parallel .3between the blocks, and the high performance parallel implementation of the inverse wavelet transform part is realized. The serial operation between the image and the image is adopted. The inverse wavelet transform consists of four steps: pre scaling, vertical filtering, horizontal filter and back scaling. Each part is accelerated by thread-level parallelism. Set the number of thread blocks of the GPU to the number of rows in the image. A GPU thread block is used to process a row to complete the parallelism between rows, and a thread is used to process a pixel point to complete the parallelism between pixels. By parallel GPU optimization of JPEG2000 decoding system based on Kakadu, the quality of decompressed and restored image is the same as that of pre-optimized image, and the decoding speed is improved by 2 to 4 times when the quality of decoded image is guaranteed. The GPU parallel optimization of the system greatly speeds up the overall speed of the decoding system, improves the throughput of the JPEG2000 image decompression algorithm, and meets the needs of real-time decoding of large amount of images.
【學(xué)位授予單位】:西安電子科技大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TN919.81
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 陳平;侯正信;;MP3解碼系統(tǒng)[J];電子測量技術(shù);2004年01期
2 張維琛,石秀侖,,將妙法,唐伯良;彩電基板解碼系統(tǒng)的研究與調(diào)試[J];上海大學(xué)學(xué)報(自然科學(xué)版);1995年06期
3 張永學(xué),余兆明;超級VCD技術(shù)及其解碼系統(tǒng)[J];電子工程師;1999年04期
4 鄭洪超,胡劍凌;嵌入式MPEG-4解碼系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[J];電子技術(shù)應(yīng)用;2004年11期
5 孔宇,頓月芹,寧飛;用CPLD提高解碼系統(tǒng)的運(yùn)行速度[J];現(xiàn)代電子技術(shù);2005年04期
6 彭彬;劉俊;;基于在線解碼系統(tǒng)設(shè)計(jì)與分析[J];現(xiàn)代計(jì)算機(jī)(專業(yè)版);2013年32期
7 羅鈞,付麗;基于DSP的MP3解碼系統(tǒng)設(shè)計(jì)[J];重慶大學(xué)學(xué)報(自然科學(xué)版);2005年01期
8 宋志章;馬麗;劉曉華;;基于ARM的數(shù)字音頻解碼系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[J];科技通報;2012年06期
9 方糧,李瓊,陳福接;基于CL9100/CL9110的MPEG-2解碼系統(tǒng)的設(shè)計(jì)[J];今日電子;1997年03期
10 邸興;張建花;陳貝;;基于STM32的BMP圖片解碼系統(tǒng)[J];電子設(shè)計(jì)工程;2011年10期
相關(guān)碩士學(xué)位論文 前8條
1 韓小晴;基于Kakadu的JPEG2000解碼系統(tǒng)GPU并行優(yōu)化[D];西安電子科技大學(xué);2014年
2 胡銀林;靜止和活動圖像一體化軟件解碼系統(tǒng)設(shè)計(jì)與實(shí)現(xiàn)[D];西安電子科技大學(xué);2011年
3 隋元明;基于DM642數(shù)字信號處理器的波前編碼解碼系統(tǒng)[D];浙江大學(xué);2012年
4 許夢陽;基于GPRS的氣象信息播報字幕解碼系統(tǒng)[D];鄭州大學(xué);2013年
5 陳明華;多媒體信息壓縮技術(shù)的研究及MPEG-2解碼系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[D];浙江工業(yè)大學(xué);2003年
6 王琨文;基于DSP的MPEG4多路解碼系統(tǒng)設(shè)計(jì)[D];華中科技大學(xué);2007年
7 曾昭貴;JPEG2000解碼系統(tǒng)的FPGA實(shí)現(xiàn)[D];西安電子科技大學(xué);2006年
8 計(jì)丹;基于定點(diǎn)DSP的MP3解碼系統(tǒng)設(shè)計(jì)與實(shí)現(xiàn)[D];華中師范大學(xué);2002年
本文編號:1942208
本文鏈接:http://sikaile.net/kejilunwen/wltx/1942208.html