天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于CUDA的幀間預(yù)測優(yōu)化及并行化

發(fā)布時間:2019-06-10 09:52
【摘要】:作為H.264/AVC編碼框架的主要模塊,,幀間預(yù)測模塊通過多幀預(yù)測、亞像素運動估計、基于率失真優(yōu)化的模式?jīng)Q策等方式實現(xiàn)壓縮效率的提升,但也使得整個模塊耗時長,資源占用率高;另一方面,基于GPU的并行編程框架CUDA(Compute UnifiedDevice Architecture)的不斷發(fā)展,使得GPU成為計算機上另一個可編程以及可執(zhí)行單元,與此同時GPU在科學(xué)計算領(lǐng)域的計算能力已遠遠超過CPU;因此,考慮如何基于CUDA平臺加速幀間預(yù)測模塊達到整體編碼效率的提升已經(jīng)成為多媒體技術(shù)和高性能計算領(lǐng)域研究的熱點問題。 通過對多種分辨率、幀率以及視頻數(shù)據(jù)的統(tǒng)計數(shù)據(jù)發(fā)現(xiàn),在幀間預(yù)測編碼過程中運動向量在局部域和全局域分布具有趨勢一致性的特征,并且不同模式編碼塊的運動向量具有強相關(guān)性;基于以上規(guī)律以及CUDA平臺的特征,對串行環(huán)境下的幀間預(yù)測模塊從整體框架和核心算法兩個角度進行優(yōu)化,主要有:(1)基于CUDA平臺將幀間預(yù)測模塊劃分為插值濾波模塊、運動估計模塊和多模式運動向量合成模塊等若干子模塊;(2)針對傳統(tǒng)全搜索算法在搜索機制上的盲目性和快速搜索算法多條件分支難以在充分調(diào)用CUDA平臺計算資源的特點,提出并實現(xiàn)了面向運動趨勢的自適應(yīng)迭代搜索算法;(3)為降低單線程計算負載、充分利用鄰域運動信息同時避免因數(shù)據(jù)依賴而導(dǎo)致并發(fā)度不高的問題,提出并實現(xiàn)了基于域劃分和雙抽樣的預(yù)搜索機制;(4)基于運動向量的層間相關(guān)性特點,提出并實現(xiàn)基于層間編碼塊的最優(yōu)運動向量合并機制。 實驗結(jié)果表明,相比全搜索算法,面向運動趨勢的迭代搜索算法可以達到70~80倍的性能提升,同時SNR保持在0.5dB以下;同快速搜索算法相比,加速可以達到3~4倍,且壓縮率更高;相比基于CUDA平臺的運動估計算法,可提升約20%的編碼效率。
[Abstract]:As the main module of H.264/AVC coding framework, the inter-frame prediction module improves the compression efficiency by multi-frame prediction, sub-pixel motion estimation, rate-distortion optimization mode decision and so on, but it also makes the whole module take a long time. The resource utilization rate is high; On the other hand, with the continuous development of parallel programming framework CUDA (Compute UnifiedDevice Architecture) based on GPU, GPU has become another programmable and enforceable unit on the computer. At the same time, the computing power of GPU in the field of scientific computing has far exceeded that of CPU;. Therefore, it has become a hot issue in the field of multimedia technology and high performance computing to consider how to accelerate the improvement of the overall coding efficiency of inter-frame prediction module based on CUDA platform. Through the statistical data of various resolutions, frame rates and video data, it is found that the distribution of motion vectors in the local domain and the global domain has the characteristic of trend consistency in the process of inter-frame prediction coding. And the motion vectors of different modes of coding blocks have strong correlation. Based on the above rules and the characteristics of CUDA platform, the inter-frame prediction module in serial environment is optimized from the overall framework and the core algorithm. The main points are as follows: (1) the inter-frame prediction module is divided into interpolation filtering module based on CUDA platform. Motion estimation module and multi-mode motion vector synthesis module and other sub-modules; (2) aiming at the blindness of the traditional full search algorithm in the search mechanism and the fact that the multi-condition branch of the fast search algorithm is difficult to fully invoke the computing resources of the CUDA platform, an adaptive iterative search algorithm oriented to the motion trend is proposed and implemented. (3) in order to reduce the single-thread computing load and make full use of neighborhood motion information to avoid the problem of low concurrency caused by data dependence, a pre-search mechanism based on domain partition and double sampling is proposed and implemented. (4) based on the interlayer correlation of motion vectors, an optimal motion vector merging mechanism based on interlayer coding blocks is proposed and implemented. The experimental results show that compared with the full search algorithm, the motion trend oriented iterative search algorithm can improve the performance of 70 脳 80 times, while SNR is kept below 0.5dB. Compared with the fast search algorithm, the acceleration can reach 3 鈮

本文編號:2496375

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/wltx/2496375.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶b18ee***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com