基于機器學習的H.265視頻轉(zhuǎn)碼研究
發(fā)布時間:2019-01-02 07:34
【摘要】:隨著計算機硬件能力的不斷提升,互聯(lián)網(wǎng)與多媒體技術(shù)的持續(xù)發(fā)展,視頻圖像傳輸和儲存需求已經(jīng)成為今天網(wǎng)絡(luò)帶寬和硬件儲存最大的挑戰(zhàn)。視頻編碼標準不斷發(fā)展、推陳出新,使得4K、VR、3D、HDR等許多新型應用正在逐步普及。與此同時,由于產(chǎn)品推廣、商業(yè)競爭、專利保護等諸多因素角力,使得在同一個時期并存多種視頻編碼標準成為不可避免的現(xiàn)實狀況。因此,研發(fā)快速轉(zhuǎn)碼工具與技術(shù)具有強烈的現(xiàn)實急迫性;谏鲜霰尘,論文重點研究面向HEVC/H.265標準的同構(gòu)和異構(gòu)視頻轉(zhuǎn)碼技術(shù)。具體而言,研究工作由三個部分組成。(1)轉(zhuǎn)碼框架的分類與研究。分析三類轉(zhuǎn)碼框架,選擇全解部分編作為實施對象,通過自適應構(gòu)建四叉編碼樹實現(xiàn)轉(zhuǎn)碼加速。具體地,提出了兩種編碼樹構(gòu)建方法,自底向上法和中間兩端法。自底向上構(gòu)建法從底層的最小編碼單元逐層歸約到最頂層,計算獲得一棵完整的四叉編碼樹。中間兩端法則是根據(jù)中間層特征向上下兩端進行預測,旨在獲得中間層節(jié)點的父節(jié)點和子節(jié)點的分布,實現(xiàn)編碼樹的構(gòu)建。(2)基于比特分布映射的視頻轉(zhuǎn)碼。碼流中編碼比特數(shù)代表了信息熵的代價,它可以直接反映出序列本身的物理屬性。編碼單元使用的編碼比特數(shù)越多,往往表示該單元所在位置的圖像內(nèi)容豐富、紋理復雜、變化劇烈等特征。通過比特數(shù)的分布,判斷在每個編碼單元的劃分情況,進一步根據(jù)映射模型構(gòu)建獲得編碼樹結(jié)構(gòu),從而實現(xiàn)快速視頻轉(zhuǎn)碼。(3)基于機器學習的視頻轉(zhuǎn)碼。碼流被分解并抽取出編碼單元比特數(shù)、運動矢量、預測模式等信息。編碼單元本身內(nèi)容的特征如方差,梯度,模糊度等信息被直接計算得到。引入機器學習方法,將各種信息抽象化表達作為特征值來進行分塊預測模型訓練。使用自底向上和中間兩端編碼樹構(gòu)建方法,構(gòu)建出完整的四叉編碼樹單元,實現(xiàn)基于機器學習的快速視頻轉(zhuǎn)碼。上述三個方面的研究,實現(xiàn)了基于比特分布和基于機器學習的兩種快速視頻轉(zhuǎn)碼算法。大量的視頻圖像編解碼測試表明,在主客觀質(zhì)量保持基本一致的情況下,論文所提出的兩種算法比目前業(yè)內(nèi)采用的全解全編轉(zhuǎn)碼方法時間節(jié)省一半以上,即達到了同構(gòu)、異構(gòu)視頻轉(zhuǎn)碼兩倍加速比。論文工作將可能為大型視頻網(wǎng)站的實時轉(zhuǎn)碼應用提供一些有益的技術(shù)參考。
[Abstract]:With the continuous improvement of computer hardware and the continuous development of Internet and multimedia technology, the demand of video and image transmission and storage has become the biggest challenge of network bandwidth and hardware storage. With the continuous development of video coding standards, many new applications such as 4K / VRV / 3D HDR are becoming more and more popular. At the same time, due to product promotion, commercial competition, patent protection and many other factors, it is inevitable to co-exist a variety of video coding standards in the same period. Therefore, the development of fast transcoding tools and technology has a strong urgency. Based on the above background, this paper focuses on isomorphism and heterogeneous video transcoding technology for HEVC/H.265 standards. Specifically, the research consists of three parts. (1) Classification and research of transcoding framework. Three kinds of transcoding frames are analyzed, and the fully decomposed partial coding is selected as the implementation object, and the transcoding acceleration is realized by adaptive construction of quaternary coding tree. Specifically, two coding tree construction methods, bottom-up method and middle end method, are proposed. The bottom-up method is reduced from the lowest coding unit to the top layer, and a complete quadrilateral coding tree is obtained. The rule of middle two ends is to predict the upper and lower ends according to the characteristics of the middle layer, in order to obtain the distribution of the parent node and the child node of the middle layer node, and to construct the coding tree. (2) Video transcoding based on bit-distributed mapping. The number of bits in the code stream represents the cost of information entropy, and it can directly reflect the physical properties of the sequence itself. The more coding bits the coding unit uses, the more the image content of the location of the unit is, the complexity of the texture and the drastic change of the image content. The partition of each coding unit is judged by the distribution of bits, and the coding tree structure is constructed according to the mapping model. (3) the video transcoding based on machine learning is realized. The bitstream is decomposed and extracted the bits of coding unit, motion vector, prediction mode and so on. The features of the coding unit such as variance, gradient, ambiguity and so on are directly calculated. The machine learning method is introduced to train the block prediction model by using various information abstractions as eigenvalues. A complete quad-coding tree unit is constructed by using bottom-up and middle end coding trees to realize fast video transcoding based on machine learning. Two fast video transcoding algorithms based on bit distribution and machine learning are implemented. A large number of video coding and decoding tests show that under the condition that the subjective and objective quality is basically the same, the two algorithms proposed in this paper save more than half of the time of the current full-resolution full-coding method, that is, the isomorphism is achieved. Heterogeneous video transcoding twice the speedup. The work of this paper may provide some useful technical references for the real-time transcoding applications of large video websites.
【學位授予單位】:電子科技大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TN919.8;TP181
本文編號:2398197
[Abstract]:With the continuous improvement of computer hardware and the continuous development of Internet and multimedia technology, the demand of video and image transmission and storage has become the biggest challenge of network bandwidth and hardware storage. With the continuous development of video coding standards, many new applications such as 4K / VRV / 3D HDR are becoming more and more popular. At the same time, due to product promotion, commercial competition, patent protection and many other factors, it is inevitable to co-exist a variety of video coding standards in the same period. Therefore, the development of fast transcoding tools and technology has a strong urgency. Based on the above background, this paper focuses on isomorphism and heterogeneous video transcoding technology for HEVC/H.265 standards. Specifically, the research consists of three parts. (1) Classification and research of transcoding framework. Three kinds of transcoding frames are analyzed, and the fully decomposed partial coding is selected as the implementation object, and the transcoding acceleration is realized by adaptive construction of quaternary coding tree. Specifically, two coding tree construction methods, bottom-up method and middle end method, are proposed. The bottom-up method is reduced from the lowest coding unit to the top layer, and a complete quadrilateral coding tree is obtained. The rule of middle two ends is to predict the upper and lower ends according to the characteristics of the middle layer, in order to obtain the distribution of the parent node and the child node of the middle layer node, and to construct the coding tree. (2) Video transcoding based on bit-distributed mapping. The number of bits in the code stream represents the cost of information entropy, and it can directly reflect the physical properties of the sequence itself. The more coding bits the coding unit uses, the more the image content of the location of the unit is, the complexity of the texture and the drastic change of the image content. The partition of each coding unit is judged by the distribution of bits, and the coding tree structure is constructed according to the mapping model. (3) the video transcoding based on machine learning is realized. The bitstream is decomposed and extracted the bits of coding unit, motion vector, prediction mode and so on. The features of the coding unit such as variance, gradient, ambiguity and so on are directly calculated. The machine learning method is introduced to train the block prediction model by using various information abstractions as eigenvalues. A complete quad-coding tree unit is constructed by using bottom-up and middle end coding trees to realize fast video transcoding based on machine learning. Two fast video transcoding algorithms based on bit distribution and machine learning are implemented. A large number of video coding and decoding tests show that under the condition that the subjective and objective quality is basically the same, the two algorithms proposed in this paper save more than half of the time of the current full-resolution full-coding method, that is, the isomorphism is achieved. Heterogeneous video transcoding twice the speedup. The work of this paper may provide some useful technical references for the real-time transcoding applications of large video websites.
【學位授予單位】:電子科技大學
【學位級別】:碩士
【學位授予年份】:2017
【分類號】:TN919.8;TP181
【參考文獻】
相關(guān)期刊論文 前4條
1 周昌令;欒興龍;肖建國;;基于深度學習的域名查詢行為向量空間嵌入[J];通信學報;2016年03期
2 蔣煒;田翔;陳耀武;;一種基于視覺顯著性分析的視頻轉(zhuǎn)碼算法[J];華中科技大學學報(自然科學版);2014年04期
3 朱秀昌;李欣;陳杰;;新一代視頻編碼標準——HEVC[J];南京郵電大學學報(自然科學版);2013年03期
4 王智文;;幾種邊緣檢測算子的性能比較研究[J];制造業(yè)自動化;2012年11期
相關(guān)碩士學位論文 前1條
1 劉奎;H.264視頻編碼幀間、幀內(nèi)算法研究[D];河海大學;2007年
,本文編號:2398197
本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/2398197.html
最近更新
教材專著