基于HEVC的幀內(nèi)預(yù)測算法優(yōu)化與并行化設(shè)計
發(fā)布時間:2018-06-04 02:44
本文選題:HEVC + 幀內(nèi)預(yù)測 ; 參考:《西安郵電大學(xué)》2016年碩士論文
【摘要】:新一代高效率視頻編碼標(biāo)準(zhǔn)HEVC (High Efficiency Video Coding)于2010年1月由視頻編碼聯(lián)合組JCT-VC首次提出,其核心目的是在H.264/AVC的基礎(chǔ)上,將壓縮效率提高一倍。為了達(dá)到這個目標(biāo),HEVC必須采用更高復(fù)雜度的視頻編解碼算法,因此也引入了極高的計算復(fù)雜度。本文在深入研究幀內(nèi)預(yù)測算法的基礎(chǔ)上,針對算法中塊分割與預(yù)測模式選擇兩個過程,分別給出了兩種優(yōu)化方案:適合于視頻質(zhì)量要求較高場景下的基于率失真優(yōu)化的快速編碼單元劃分算法和適合于實時性要求較高場景下的基于模式分組的幀內(nèi)預(yù)測模式快速選擇算法,兩種算法都有效的降低了計算復(fù)雜度,提高了幀內(nèi)編碼的效率。此外,考慮到HEVC中,幀內(nèi)預(yù)測算法的串行操作方式及35種預(yù)測模式的排隊處理,對預(yù)測編碼時間和性能的影響,本文基于西安郵電大學(xué)自主研發(fā)的面向視頻編解碼的動態(tài)可編程可重構(gòu)的陣列處理器DPR-CODEC,提出了參考像素平滑和預(yù)測模式快速選擇的并行化方案,有效降低了單個處理單元串行操作所需的數(shù)據(jù)加載的時鐘周期數(shù)以及模式預(yù)測時所需的總時間,提高了計算效率。具體工作如下:1.HEVC編碼單元劃分方法的改進(jìn):通過對不同深度間的率失真代價進(jìn)行統(tǒng)計研究,發(fā)現(xiàn)不進(jìn)行分割的CU率失真代價值偏小,而進(jìn)行分割的CU率失真代價值比較大并且分布比較均勻;谠撎匦,本文對不同的量化參數(shù)下的率失真代價概率分布圖產(chǎn)生的閾值進(jìn)行統(tǒng)計,得到了最大編碼單元(Largest CU,LCU)劃分過程中不同深度的閾值方程,并利用該閾值提前終止編碼單元的劃分,從而達(dá)到降低計算復(fù)雜度的目的。實驗表明改進(jìn)的方法與HEVC測試模型HM10.0相比,碼率增加了0.5%,Y-PSNR降低了0.019dB,編碼時間減少了26.7%。2.提出基于模式分組的幀內(nèi)預(yù)測模式快速選擇算法:利用候選模式集中排列第一的預(yù)測模式與最優(yōu)預(yù)測模式之間的強相關(guān)性,本文通過對35種預(yù)測模式進(jìn)行初次篩選和再次篩選,快速精確的找到成為最優(yōu)預(yù)測模式概率最大的候選模式。該方法大大減少了進(jìn)入RDO過程的模式數(shù)量,有效地降低了原有幀內(nèi)預(yù)測編碼算法的計算復(fù)雜度。實驗表明該算法在保證視頻質(zhì)量和碼率基本不變的前提下減少了41.8%的編碼時間。3.參考像素平滑的并行化設(shè)計:HEVC測試模型是針對單處理器系統(tǒng)設(shè)計的,其參考像素平滑以串行方式執(zhí)行時,像素點的濾波運算會受到彼此數(shù)據(jù)加載的影響,導(dǎo)致處理器無法快速處理。因此,本文給出了一種將所有相關(guān)像素點一次性加載完畢,然后再統(tǒng)一進(jìn)行濾波計算的思路,完成了參考像素平滑的并行化設(shè)計。經(jīng)過仿真驗證,該方案串/并行加速比達(dá)到14.43。4.預(yù)測模式快速選擇算法并行化設(shè)計:考慮到DPR-CODEC的資源限制以及計算效率,在進(jìn)行幀內(nèi)預(yù)測時,根據(jù)預(yù)測方向與圖像強紋理方向的相關(guān)性,篩選出預(yù)測方向出現(xiàn)概率較大的模式進(jìn)行預(yù)測。其并行化思路是:每一個簇同時對預(yù)測塊的16個像素點進(jìn)行預(yù)測運算,每個PE獨立完成12種模式計算。該方案解決了串行模式下,預(yù)測模式計算相互等待的問題,實現(xiàn)了多個像素點對預(yù)測模式選擇的并行處理。仿真結(jié)果表明,模式預(yù)測并行化設(shè)計方案串/并行加速比達(dá)到7.60,提高了運算效率。
[Abstract]:The new generation of high efficiency video coding standard HEVC (High Efficiency Video Coding) was first proposed by the video coding joint group JCT-VC in January 2010. Its core aim is to double the compression efficiency on the basis of H.264/AVC. In order to achieve this goal, HEVC must adopt a higher complexity video codec algorithm, so it is also introduced. In this paper, based on the in-depth study of intra prediction algorithm, this paper gives two optimization schemes for the two processes of block segmentation and prediction mode selection in the algorithm, which are suitable for fast coding unit partition algorithm based on rate distortion optimization and suitable for real-time requirements under high video quality requirements. The fast selection algorithm of intra prediction mode based on pattern grouping in higher scene, the two algorithms all effectively reduce the computational complexity and improve the efficiency of intra coding. In addition, the influence of the serial operation mode of intra prediction algorithm and the queue processing of the 35 prediction modes in HEVC is considered, and the effect of the prediction coding time and performance is discussed. Based on the dynamic programmable and reconfigurable array processor DPR-CODEC for video codec, based on the Xi'an University of post and telecommunications, this paper proposes a parallel scheme for the rapid selection of reference pixel smoothing and prediction mode, which effectively reduces the number of clock cycles required for data loading in a single processing unit and when the mode is predicted. The total time needed to improve the computing efficiency. The concrete work is as follows: the improvement of the 1.HEVC coding unit division method: through the statistical study of the rate distortion cost between different depths, it is found that the CU rate distortion of the non segmented ratio is smaller, and the CU rate distortion cost of the segmentation is larger and the distribution is more uniform. In this paper, the threshold value generated by the probability distribution graph of rate distortion cost under different quantized parameters is counted, and the threshold equation of the different depth in the Largest CU, LCU is obtained, and the division of the coding unit is terminated in advance by using the threshold. The experiment shows that the improvement is improved. Compared with the HEVC test model HM10.0, the code rate increased by 0.5%, the Y-PSNR reduced the 0.019dB, and the encoding time reduced the fast selection algorithm of the intra prediction mode based on the pattern packet based on 26.7%.2.: the strong correlation between the first prediction mode and the optimal prediction mode was arranged by the candidate pattern centralization, and the article through the analysis of the prediction model. The first selection and re screening of the model can quickly and accurately find the best probability model of the optimal prediction model. This method greatly reduces the number of modes entering the RDO process and effectively reduces the computational complexity of the original intra prediction coding algorithm. The experiment shows that the algorithm ensures that the video quality and the bit rate are basically the same. Under the premise of the reduction of 41.8% encoding time.3. reference pixel smooth parallelization design: the HEVC test model is designed for the single processor system. When the reference pixel is executed in a smooth and serial manner, the filtering operation of the pixels will be influenced by the data loading of each other, resulting in the processing of the processor. A kind of idea that all relevant pixels are loaded at one time and then reunified the idea of filtering calculation, complete the parallelization design of reference pixel smoothing. After simulation, the scheme series / parallel acceleration ratio achieves the parallel design of the fast selection algorithm of 14.43.4. prediction mode: taking into account the resource constraints and computational efficiency of DPR-CODEC Rate, in the prediction of intra frame, according to the correlation between the prediction direction and the strong texture direction of the image, the model of the larger probability of the prediction direction is screened out. The parallelization idea is that each cluster performs the prediction operation on 16 pixels of the prediction block at the same time, each PE completes 12 modes calculation alone. The scheme solves the serial number. In the model, the prediction model calculates the problem of mutual waiting, and realizes the parallel processing of multiple pixels to the prediction mode selection. The simulation results show that the serial / parallel acceleration ratio of the pattern prediction parallel design scheme reaches 7.60, and the operation efficiency is improved.
【學(xué)位授予單位】:西安郵電大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2016
【分類號】:TN919.81
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 倪偉,郭寶龍,陳龍?zhí)?一種三階頻域幀內(nèi)預(yù)測算法[J];計算機應(yīng)用研究;2004年12期
2 王嵩,王青,薛全;幀內(nèi)預(yù)測的模式選擇快速算法研究及其實現(xiàn)[J];浙江理工大學(xué)學(xué)報;2005年01期
3 杜博;方向忠;;一種新的H·264幀內(nèi)預(yù)測快速算法[J];電子測量技術(shù);2006年04期
4 李海燕;張春元;付劍;;基于流體系結(jié)構(gòu)的幀內(nèi)預(yù)測算法優(yōu)化設(shè)計[J];電子學(xué)報;2010年05期
5 喻慶東;周莉;朱s,
本文編號:1975501
本文鏈接:http://sikaile.net/kejilunwen/xinxigongchenglunwen/1975501.html
最近更新
教材專著