能量受限條件下的手語視頻編碼方法研究

發(fā)布時間：2018-10-15 17:57

【摘要】：手語是由手形、手臂運動并輔之以表情、唇動以及其他體勢表達思想的視覺語言,是聾啞人進行交流的最自然方式。與頭肩視頻不同,手語視頻由于增加了手形、手臂運動,并且存在手臉遮擋現(xiàn)象,所以更為復雜,對其進行研究難度更大。和手語視頻識別與合成研究相比,目前針對手語視頻的編碼研究還較少,且大多數(shù)都是基于率失真(Rate-Distortion, R-D)理論,以給定編碼碼率為約束,研究編碼碼率和失真之間的關系,使重建手語視頻的失真最小。但是,隨著無線網(wǎng)絡帶寬的快速增加和新一代視頻編碼標準H.264的廣泛應用,編碼碼率的約束性已經(jīng)越來越弱,而無線視頻終端在功耗上所受的制約卻越來越強。因此,如何在無線視頻終端能量有限的約束條件下,使手語視頻經(jīng)編碼后的失真最小,減小能耗、延長電池的更新周期已成為一個迫切需要解決的問題。本論文對能量受限條件下的手語視頻編碼進行了深入的研究,目的是利用聾啞人視覺選擇注意機制、功率率失真理論和感興趣區(qū)能量分配視頻編碼方法實現(xiàn)手語視頻編碼功耗、編碼碼率和編碼失真之間的動態(tài)平衡優(yōu)化,在確保手語視頻主客觀編碼質量的同時,盡可能降低無線視頻終端總體功耗,延長電池更新周期,為解決能量受限條件下聾啞人手語視頻編碼的最優(yōu)化參數(shù)配置和資源分配提供新理論和新方法。本論文的研究工作主要包括： (1)理論分析和實驗統(tǒng)計了影響H.264手語視頻編碼復雜度的因素,將H.264手語視頻編碼器參數(shù)按照復雜度分為四種不同的級別,每種級別具有不同的編碼復雜度和編碼質量,然后依據(jù)無線視頻終端電池能量和視頻運動復雜性自適應地選擇編碼級別。實驗結果表明該方法在保證手語視頻編碼質量基本不變的同時,能夠減少編碼器計算復雜度,節(jié)省無線視頻終端系統(tǒng)的計算資源。 (2)綜合考慮無線視頻終端電池能量的時變性和聾啞人視覺注意機制的不平衡性,建立了感興趣區(qū)能量感知手語視頻編碼方法,該方法在幀層依據(jù)無線視頻終端當前可使用電池能量和視頻幀復雜度確定參考幀數(shù)和搜素范圍,在宏塊層依據(jù)手語視頻不同宏塊區(qū)域的視覺重要性確定宏塊預測模式和量化系數(shù),最后根據(jù)幀層和宏塊層共同確定的參數(shù)進行編碼。實驗結果表明該方法在保證手語視頻感興趣區(qū)編碼質量的同時,能夠進一步減少編碼器計算復雜度,節(jié)省無線視頻終端系統(tǒng)的計算資源。 (3)詳細分析了H.264幀內(nèi)、幀間和跳幀三種編碼模式的功率率失真(Power-Rate-Distortion,P-R-D)特性,在此基礎上,分別建立了編碼一幀手語視頻的能耗模型和P-R-D模型,并提出了優(yōu)化一幀視頻中采用幀內(nèi)、幀間和跳幀編碼模式宏塊個數(shù)的算法,實驗表明所提出的P-R-D模型和實測P-R-D性能相吻合。 (4)針對手臉遮擋條件下的手語視頻手勢檢測問題,提出一種基于力場(Force Field)轉換的手勢檢測方法。該方法首先分別計算手臉遮擋幀和純臉部幀的力場圖像,然后將力場圖像分塊并統(tǒng)計各分塊直方圖特征,再將相同空間位置的分塊直方圖對應相減,得到各分塊直方圖灰度分量差,最后將各分塊直方圖灰度分量差與灰度閾值進行比較獲得手部位置。實驗證明該方法能夠實時進行手臉遮擋條件下的手勢檢測。
[Abstract]:Sign language is the most natural way for the deaf and mute to communicate with the visual language of expression, lip movement and other body potential expression. Different from head-shoulder video, sign language video is more complicated and more difficult to study because of the increase of hand shape and arm movement. Compared with the research of sign language video recognition and synthesis, the current coding research for sign language video is less, and most of them are rate-distortion (R-D) theory, and the relationship between coding rate and distortion is studied based on rate-distortion (R-D) theory, so that the distortion of reconstructed sign language video is minimized. However, with the rapid increase of wireless network bandwidth and the wide application of new generation video coding standard H.264, the restriction of coding rate has become weaker and stronger, while the limitation of wireless video terminal in power consumption is becoming stronger and stronger. Therefore, how to minimize the distortion of sign language video, reduce energy consumption and prolong battery renewal cycle has become an urgent problem under the condition of limited energy of wireless video terminal. This paper makes an in-depth study of sign language video coding under energy-limited conditions with the aim of realizing sign language video coding by using the visual selection attention mechanism of the deaf-mute, the power rate distortion theory and the energy distribution video coding method of the region of interest. the dynamic balance optimization between power consumption, coding code rate and coding distortion can reduce the overall power consumption of the wireless video terminal as much as possible while ensuring the subjective and objective coding quality of the sign language video, New theory and new method for optimizing parameter configuration and resource allocation for deaf-mute sign language video coding under energy-limited condition Methods: The research work of this thesis mainly comprises the following steps: (1) theoretical analysis and experiment statistics influence factors influencing the video coding complexity of H.264 sign language, and divides the parameters of the H.264 sign language video coder into four different levels according to the complexity, and then adaptively selects according to the energy of the battery and the complexity of the video motion of the wireless video terminal. The experiment results show that the method can reduce the computational complexity of the encoder and save the wireless video terminal system while ensuring the quality of the sign language video coding is basically unchanged. (2) the energy perception of the region of interest is established by comprehensively considering the imbalance of the energy of the wireless video terminal battery and the visual attention mechanism of the deaf-mute; the method comprises the following steps of: determining the reference frame number and the search element range according to the current available battery energy and the video frame complexity of the wireless video terminal according to the current available battery energy and the video frame complexity of the wireless video terminal; determining the macro block according to the visual importance of different macro block areas of the sign language video at the macro block layer; the measurement mode and the quantization coefficient are finally determined according to the frame layer and the macro block layer; The experimental results show that the method can reduce the computational complexity of the encoder and save the wireless video at the same time of guaranteeing the coding quality of the sign language video ROI. Power-Rate-Distance (P-R-D) characteristics of three coding modes of H. 264 frame, inter-frame and inter-frame coding modes are analyzed in detail. On this basis, the energy consumption model and P-R-D model of coded frame sign language video are respectively set up. An algorithm is used to optimize the number of macro blocks in frame, inter-frame and skip coding mode in one frame of video. The experiment results show that the proposed P-R-D model and reality The performance of P-R-D is matched. (4) The force field (Force F) is proposed for sign language video gesture detection under the shielding condition of hand face. The method comprises the following steps of: respectively calculating a force field image of a hand face shielding frame and a pure face frame, in that method, the gray component difference of each block histogram is obtain, and finally, the gray component difference of each block histogram is equal to that of each block histogram, The gray threshold is compared to obtain the hand position. The experiment proves that the method can be used in real time
【學位授予單位】：蘭州理工大學
【學位級別】：博士
【學位授予年份】：2014
【分類號】：TN919.81

【參考文獻】

相關期刊論文前10條

1 劉鵬宇;何絮;賈克斌;;對特定模式進行預判的H.264幀間快速編碼算法[J];兵工學報;2011年04期

2 崔玉斌;蔡安妮;;一種新穎的H.264幀內(nèi)預測快速算法[J];北京郵電大學學報;2008年02期

3 韋耿;王亮;朱斌;;無線移動環(huán)境視頻編碼動態(tài)功耗模型研究[J];傳感技術學報;2009年03期

4 張淑芳;李華;;基于H.264的多參考幀快速選擇算法[J];電子學報;2009年01期

5 吳曉軍;白世軍;盧文濤;;基于H.264視頻編碼的運動估計算法優(yōu)化[J];電子學報;2009年11期

6 周宇;陳熙霖;趙德斌;姚鴻勛;高文;;基于數(shù)據(jù)生成的手語識別自適應方法[J];高技術通訊;2009年12期

7 何書前;倪江群;石春;;一種分層判決結構的H.264/AVC快速幀間模式選擇方法[J];電子學報;2013年11期

8 曹昕燕;趙繼印;李敏;;基于膚色和運動檢測技術的單目視覺手勢分割[J];湖南大學學報(自然科學版);2011年01期

9 楊春玲;王華興;;基于結構相似度的H.264快速運動估計算法[J];華南理工大學學報(自然科學版);2008年08期

10 張良國;高文;陳熙霖;陳益強;王春立;;面向中等詞匯量的中國手語視覺識別系統(tǒng)[J];計算機研究與發(fā)展;2006年03期

相關博士學位論文前2條

1 韋耿;視頻編碼功率率失真模型及低復雜度算法研究[D];華中科技大學;2007年

2 李斌;面向高性能視頻編碼標準的率失真優(yōu)化技術研究[D];中國科學技術大學;2013年

，

本文編號：2273345

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/wltx/2273345.html

上一篇：基于車載傳感器網(wǎng)絡的路口數(shù)據(jù)傳輸?shù)乃惴ㄑ芯?/a>
下一篇：D2D-MIMO系統(tǒng)中基于下行預編碼的干擾抑制策略

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

能量受限條件下的手語視頻編碼方法研究