基于視覺特效的高效視頻編碼技術(shù)研究
發(fā)布時(shí)間:2019-04-30 09:20
【摘要】:隨著移動(dòng)通信技術(shù)和無人機(jī)技術(shù)的迅猛發(fā)展,以數(shù)字視頻為主要內(nèi)容的多媒體網(wǎng)絡(luò)技術(shù)不再只用于傳統(tǒng)的電視系統(tǒng),特別是隨著無人機(jī)技術(shù)在自然災(zāi)害監(jiān)控、商業(yè)演出及軍事輔助等領(lǐng)域的作用越來越突出,對(duì)數(shù)字視頻編碼技術(shù)的要求也越來越高。雖然傳統(tǒng)的視頻編碼技術(shù)在消除空間冗余、時(shí)間冗余及信息熵冗余等方面已經(jīng)取得了很大的進(jìn)步,然而在消除視覺冗余上卻沒有明顯的成果。歸根結(jié)底,人眼才是視頻信號(hào)的最終受體,因此本文針對(duì)單人觀看視頻時(shí)網(wǎng)絡(luò)帶寬環(huán)境惡劣的應(yīng)用場(chǎng)景,深入研究人眼視覺系統(tǒng)特性并借鑒多描述編碼的思想,提出了支持視頻信息分為3條視頻流同時(shí)傳輸?shù)木幋a系統(tǒng)。最后通過實(shí)驗(yàn)數(shù)據(jù)得知,傳統(tǒng)的基于H.264的編碼系統(tǒng)與本文的編碼系統(tǒng)在碼率相似且不足的情況下,本文的系統(tǒng)有更好的視覺效果;而當(dāng)它們?cè)谝曈X效果相似的情況下,本文提出的系統(tǒng)能夠節(jié)約20%左右的碼率。本文研究了多種自下而上的顯著圖模型,并根據(jù)各個(gè)顯著圖模型的優(yōu)缺點(diǎn)以及本文視頻編碼系統(tǒng)應(yīng)用場(chǎng)景的需求選擇了基于頻率調(diào)諧算法的顯著圖模型。并且在該模型的基礎(chǔ)上提出了一種均衡圖像亮度與色度作用的改進(jìn)顯著圖模型。實(shí)驗(yàn)結(jié)果表明,改進(jìn)的模型更能準(zhǔn)確和有效地檢測(cè)出顯著圖。最后,利用改進(jìn)的顯著圖模型實(shí)現(xiàn)了圖像感興趣區(qū)域的獲取。恰可察失真模型使用量化的閾值來表示視覺感知冗余,不高于這個(gè)閾值的變化,人的眼睛是沒有辦法感覺到的。由此可以得知,任何不被注意的信息差異都不用被編碼到視頻碼流中。本文分別研究了該模型的對(duì)比度掩蔽效應(yīng)、背景亮度掩蔽效應(yīng)和時(shí)域掩蔽效應(yīng),并在最終的系統(tǒng)中實(shí)現(xiàn)了該模型。視覺注意力模型主要是利用了視網(wǎng)膜上的視錐細(xì)胞分布高度不均勻的特性。在中心凹處細(xì)胞分布密度最大,隨著到中央凹距離的加大,細(xì)胞分布密度減小非?。這就導(dǎo)致了人眼系統(tǒng)在視覺中心有最高的空間分辨率而隨著圖像點(diǎn)到視覺中心距離的增大空間分辨率急速下降。本文在研究了視覺注意力模型的基礎(chǔ)上,結(jié)合恰可察失真模型實(shí)現(xiàn)了基于圖像內(nèi)容的視覺注意力模型,并利用運(yùn)動(dòng)矢量實(shí)現(xiàn)了視覺關(guān)注點(diǎn)的動(dòng)態(tài)轉(zhuǎn)移。
[Abstract]:With the rapid development of mobile communication technology and unmanned aerial vehicle (UAV) technology, digital video as the main content of multimedia network technology is no longer only used in the traditional television system, especially with the UAV technology in natural disaster monitoring. The role of commercial performance and military assistance is becoming more and more prominent, and the requirements of digital video coding technology are becoming higher and higher. Although the traditional video coding technology has made great progress in eliminating spatial redundancy, temporal redundancy and information entropy redundancy, there is no obvious achievement in eliminating visual redundancy. In the final analysis, the human eye is the final recipient of the video signal. Therefore, aiming at the application scene of the bad network bandwidth environment when watching the video by a single person, this paper deeply studies the characteristics of the human visual system and draws lessons from the idea of multi-description coding. A coding system is proposed to support the simultaneous transmission of video information into three video streams. Finally, the experimental data show that the traditional coding system based on H.264 and the coding system in this paper have better visual effect when the bit rate is similar and the coding system in this paper is not enough. When their visual effects are similar, the proposed system can save about 20% bit rate. In this paper, a variety of bottom-up salient graph models are studied, and the salient graph model based on frequency tuning algorithm is selected according to the advantages and disadvantages of each salient graph model and the requirements of the application scenario of video coding system in this paper. On the basis of this model, an improved salient graph model is proposed to balance the effects of luminance and chromaticity on the image. The experimental results show that the improved model can more accurately and effectively detect the salient map. Finally, the region of interest of the image is obtained by using the improved salient graph model. A quantized threshold is used to represent visual perceptual redundancy, which is not higher than the change of the threshold. The eyes can't feel it. As a result, any unnoticed differences in information are not encoded into the video stream. In this paper, the contrast masking effect, the background brightness masking effect and the temporal masking effect of the model are studied respectively, and the model is implemented in the final system. Visual attention model mainly utilizes the highly uneven distribution of pyramidal cells in the retina. The distribution density of cells in the central recess was the largest, and decreased very quickly with the increase of the distance to the central recess. This leads to the highest spatial resolution in the visual center of the human eye system, while the spatial resolution decreases rapidly with the increase of the distance from the image point to the visual center. On the basis of studying the visual attention model, the visual attention model based on image content is realized by combining with the exact distortion model, and the dynamic transfer of visual attention point is realized by using motion vector.
【學(xué)位授予單位】:電子科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TN919.81
本文編號(hào):2468668
[Abstract]:With the rapid development of mobile communication technology and unmanned aerial vehicle (UAV) technology, digital video as the main content of multimedia network technology is no longer only used in the traditional television system, especially with the UAV technology in natural disaster monitoring. The role of commercial performance and military assistance is becoming more and more prominent, and the requirements of digital video coding technology are becoming higher and higher. Although the traditional video coding technology has made great progress in eliminating spatial redundancy, temporal redundancy and information entropy redundancy, there is no obvious achievement in eliminating visual redundancy. In the final analysis, the human eye is the final recipient of the video signal. Therefore, aiming at the application scene of the bad network bandwidth environment when watching the video by a single person, this paper deeply studies the characteristics of the human visual system and draws lessons from the idea of multi-description coding. A coding system is proposed to support the simultaneous transmission of video information into three video streams. Finally, the experimental data show that the traditional coding system based on H.264 and the coding system in this paper have better visual effect when the bit rate is similar and the coding system in this paper is not enough. When their visual effects are similar, the proposed system can save about 20% bit rate. In this paper, a variety of bottom-up salient graph models are studied, and the salient graph model based on frequency tuning algorithm is selected according to the advantages and disadvantages of each salient graph model and the requirements of the application scenario of video coding system in this paper. On the basis of this model, an improved salient graph model is proposed to balance the effects of luminance and chromaticity on the image. The experimental results show that the improved model can more accurately and effectively detect the salient map. Finally, the region of interest of the image is obtained by using the improved salient graph model. A quantized threshold is used to represent visual perceptual redundancy, which is not higher than the change of the threshold. The eyes can't feel it. As a result, any unnoticed differences in information are not encoded into the video stream. In this paper, the contrast masking effect, the background brightness masking effect and the temporal masking effect of the model are studied respectively, and the model is implemented in the final system. Visual attention model mainly utilizes the highly uneven distribution of pyramidal cells in the retina. The distribution density of cells in the central recess was the largest, and decreased very quickly with the increase of the distance to the central recess. This leads to the highest spatial resolution in the visual center of the human eye system, while the spatial resolution decreases rapidly with the increase of the distance from the image point to the visual center. On the basis of studying the visual attention model, the visual attention model based on image content is realized by combining with the exact distortion model, and the dynamic transfer of visual attention point is realized by using motion vector.
【學(xué)位授予單位】:電子科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TN919.81
【參考文獻(xiàn)】
相關(guān)期刊論文 前1條
1 趙誼虹,余松煜,程國(guó)華;多描述編碼的研究現(xiàn)狀及其展望[J];通信學(xué)報(bào);2005年01期
,本文編號(hào):2468668
本文鏈接:http://sikaile.net/kejilunwen/xinxigongchenglunwen/2468668.html
最近更新
教材專著