基于CUDA的H.264視頻解碼算法的研究與實(shí)現(xiàn)
本文關(guān)鍵詞: H.264解碼器 CUDA并行計(jì)算 反變換 反量化 幀內(nèi)預(yù)測(cè) 幀間預(yù)測(cè) 環(huán)路濾波 出處:《南京理工大學(xué)》2014年碩士論文 論文類(lèi)型:學(xué)位論文
【摘要】:由ITU-T與IOS/IEC兩大國(guó)際標(biāo)準(zhǔn)化組織于2003年共同提出的H.264視頻編解碼標(biāo)準(zhǔn)是當(dāng)今視頻壓縮領(lǐng)域中壓縮性能最優(yōu)的實(shí)用視頻編解碼標(biāo)準(zhǔn)。H.264編解碼算法性能的改善是以算法復(fù)雜度的提高為代價(jià)的,如何在不影響解碼圖像質(zhì)量的前提下提高解碼效率,是眾多學(xué)者共同的研究方向。 近年來(lái),圖形處理器(Graphic Process Unit,GPU)的快速發(fā)展使得其逐步用于通用計(jì)算。NVidia于2007年推出的統(tǒng)一計(jì)算設(shè)備架構(gòu)(Computed Unified Device Architecture,CUDA),為GPU通用計(jì)算提供了良好的軟硬件開(kāi)發(fā)環(huán)境。 本文提出基于CUDA架構(gòu)來(lái)實(shí)現(xiàn)H.264視頻解碼算法,對(duì)H.264串行解碼器進(jìn)行任務(wù)劃分,使CPU負(fù)責(zé)碼流分析、熵解碼、重排序以及與GPU端的數(shù)據(jù)傳輸和內(nèi)存分配,GPU端負(fù)責(zé)反變換、反量化、幀內(nèi)預(yù)測(cè)、幀間預(yù)測(cè)以及環(huán)路濾波模塊的并行實(shí)現(xiàn)。 本文分析了反變換、反量化、幀內(nèi)預(yù)測(cè)、幀間預(yù)測(cè)、環(huán)路濾波模塊,并分別為各模塊提出了高效的并行實(shí)現(xiàn)算法。為反量化提出全并行反量化算法;為反變換提出并行蝶形反變換和全并行反變換算法;為幀內(nèi)預(yù)測(cè)提出局部并行預(yù)測(cè)算法,并對(duì)其進(jìn)行了優(yōu)化:為幀間預(yù)測(cè)提出高效的全并行幀間預(yù)測(cè)算法;分別為環(huán)路濾波強(qiáng)度求取和執(zhí)行環(huán)節(jié)提出全并行實(shí)現(xiàn)算法。 通過(guò)一系列的實(shí)驗(yàn),在相同軟硬件環(huán)境以及基本保證圖像恢復(fù)質(zhì)量的前提下,本文提出的基于CUDA的H.264視頻解碼算法,能夠達(dá)到現(xiàn)行的FFmpeg串行解碼器的10倍加速效果。
[Abstract]:H.264 video coding and decoding standard, which was jointly proposed by ITU-T and IOS/IEC in 2003, is a practical video coding and decoding standard with the best compression performance in the field of video compression. The performance of the standard. H. 264 codec algorithm is improved at the expense of the complexity of the algorithm. How to improve the decoding efficiency without affecting the quality of decoded images is the common research direction of many scholars. In recent years, graphic Process Unit has been developed. The rapid development of GPU makes it gradually used in general computing. NVidia introduced the unified computing equipment architecture in 2007 (. Computed Unified Device Architecture. CUDAN provides a good software and hardware development environment for GPU general computing. This paper proposes a H.264 video decoding algorithm based on CUDA architecture. The task of H.264 serial decoder is divided so that CPU is responsible for code stream analysis and entropy decoding. Reordering and parallel implementation of data transfer and memory allocation modules with GPU are responsible for inverse transformation, inverse quantization, intra prediction, inter frame prediction and loop filtering. This paper analyzes the inverse transform, inverse quantization, intra prediction, inter frame prediction, loop filter module, and proposes efficient parallel implementation algorithms for each module, and proposes a full parallel inverse quantization algorithm for inverse quantization. Parallel butterfly inverse transform and full parallel inverse transform algorithm are proposed for inverse transformation. A local parallel prediction algorithm for intra prediction is proposed and optimized. An efficient full parallel inter frame prediction algorithm is proposed for inter frame prediction. All parallel implementation algorithms are proposed for the loop filter strength estimation and implementation. Through a series of experiments, the H.264 video decoding algorithm based on CUDA is proposed under the same hardware and software environment and basic guarantee of image recovery quality. Can achieve the current FFmpeg serial decoder 10 times the acceleration effect.
【學(xué)位授予單位】:南京理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類(lèi)號(hào)】:TN919.81
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 孫立;王健;郭春輝;季曉勇;;基于CUDA的H.264去方塊濾波的設(shè)計(jì)與實(shí)現(xiàn)[J];電視技術(shù);2010年05期
2 張曉星;劉冀偉;張波;崔朝輝;張嵐;;分布視頻編碼中基于幀間相關(guān)性的自適應(yīng)關(guān)鍵幀選取算法[J];光電子.激光;2010年10期
3 魏曉君;張剛;;AVS解碼器環(huán)路濾波的優(yōu)化及實(shí)現(xiàn)[J];電視技術(shù);2013年05期
4 許亞軍;韓雪松;韓應(yīng)征;;AVS二維DCT變換的FPGA實(shí)現(xiàn)[J];電視技術(shù);2013年11期
5 陳樂(lè);;CUDA處理機(jī)管理機(jī)制分析[J];福建電腦;2010年08期
6 孫偉平;向杰;陳加忠;余勝生;;基于GPU的粒子濾波并行算法[J];華中科技大學(xué)學(xué)報(bào)(自然科學(xué)版);2011年05期
7 吳恩華,柳有權(quán);基于圖形處理器(GPU)的通用計(jì)算[J];計(jì)算機(jī)輔助設(shè)計(jì)與圖形學(xué)學(xué)報(bào);2004年05期
8 李振偉;彭思龍;王強(qiáng);;精度可配置DCT及其VLSI設(shè)計(jì)[J];計(jì)算機(jī)輔助設(shè)計(jì)與圖形學(xué)學(xué)報(bào);2008年03期
9 韓博;周秉鋒;;GPGPU性能模型及應(yīng)用實(shí)例分析[J];計(jì)算機(jī)輔助設(shè)計(jì)與圖形學(xué)學(xué)報(bào);2009年09期
10 甘新標(biāo);沈立;王志英;;基于CUDA的并行全搜索運(yùn)動(dòng)估計(jì)算法[J];計(jì)算機(jī)輔助設(shè)計(jì)與圖形學(xué)學(xué)報(bào);2010年03期
相關(guān)博士學(xué)位論文 前1條
1 馬安國(guó);高效能GPGPU體系結(jié)構(gòu)關(guān)鍵技術(shù)研究[D];國(guó)防科學(xué)技術(shù)大學(xué);2011年
,本文編號(hào):1449284
本文鏈接:http://sikaile.net/kejilunwen/wltx/1449284.html