語音與音頻信號的通用編碼方法研究
發(fā)布時間:2018-05-22 13:56
本文選題:語音編碼 + 音頻編碼; 參考:《北京工業(yè)大學》2014年博士論文
【摘要】:隨著網(wǎng)絡通信、移動通信和多媒體技術(shù)的快速發(fā)展,不同網(wǎng)絡、系統(tǒng)和服務平臺之間的相互融合已經(jīng)成為一種必然。在這一趨勢下,通信與娛樂之間已不再具有明顯的界限,人們已經(jīng)不滿足于單一的語音通信需求,更希望享受兼容語音與音頻的通信服務所帶來的愉悅。但是,傳統(tǒng)語音與音頻編碼由于算法模型的限制,無法同時對語音、音頻及其混合信號取得理想的編碼效果,從而限制了移動多媒體技術(shù)的進一步發(fā)展。 基于此背景,運動圖像專家組(Moving Picture Expert Group, MPEG)提出了構(gòu)建語音與音頻通用編碼器的倡議。嘗試利用統(tǒng)一的編碼模型,實現(xiàn)對語音、音頻及其混合信號的通用編碼,以克服傳統(tǒng)語音和音頻編碼器僅適合處理單一類型信號的弊端。因此,該倡議一經(jīng)提出就成為語音頻編碼研究的熱點問題,目前多家研究機構(gòu)均參與了對通用編碼算法的研究。 針對這一問題,本文對現(xiàn)有語音和音頻編碼技術(shù)展開深入研究,從語音和音頻信號共有的諧波特征出發(fā),提出了兩種通用編碼框架,并最終在24kbps和32kbps碼率下實現(xiàn)了對寬帶語音和音頻信號的通用編碼。 本文的主要成果體現(xiàn)為如下幾個方面: 1.本文基于信號特征成分分離的思想,通過發(fā)掘語音和音頻信號共有的諧波特性來搭建通用編碼框架。該框架拋開現(xiàn)有通用編碼技術(shù)基于類型判別和選擇的編碼機制,利用統(tǒng)一模型對輸入信號進行分析,,通過保持量化前后信號概率密度分布的一致性實現(xiàn)通用編碼,有效地解決了現(xiàn)有通用編碼器過分依賴信號類型判別和對混合信號量化機制選擇不合理等缺點和不足; 2.本文將經(jīng)驗模態(tài)分解算法(Empirical Mode Decomposition, EMD)引入語音與音頻編碼領(lǐng)域,基于輸入信號本征模態(tài)函數(shù)的感知重要性和周期性特征,利用EMD分解的自適應濾波特性,提出了一種基于信號特征的諧波分離算法,通過提取輸入信號的諧波成分,提高了正弦模型參數(shù)估計的準確性; 3.提出了一種基于諧波分離的正弦參數(shù)通用編碼算法,該算法采用混合編碼的方式對輸入信號的不同特征成分進行分別編碼,以發(fā)揮參數(shù)編碼和變換編碼的不同優(yōu)勢,從而達到系統(tǒng)的整體最優(yōu)。對于諧波成分,本文采用基于感知梯度加權(quán)的匹配追蹤算法進行正弦參數(shù)建模和多分辨率量化編碼;對于非諧波成分,本文提出了一種基于RE8格的抖動格型矢量量化方法,使得量化噪聲表現(xiàn)為獨立于原始信號的高斯白噪聲,從而提升了合成信號的主觀感知質(zhì)量; 4.為了提升所提正弦參數(shù)通用編碼算法對語音信號的編碼質(zhì)量,本文將基頻同步分析技術(shù)與功率譜保持量化相結(jié)合,提出了一種基于基頻同步的語音量化方法。該算法利用輸入信號的基頻信息,將輸入信號規(guī)整為具有固定周期的規(guī)整信號,并對規(guī)整后的周期信號進行稀疏變換,通過能量集中的方式實現(xiàn)對濁音語音調(diào)制變換系數(shù)的稀疏化,從而提升了編碼器對語音信號的壓縮效率; 5.在原有基頻同步分析算法基礎(chǔ)上,提出了一種基于能量加權(quán)歸一化互相關(guān)的自適應分析窗長判決方法,使其能夠?qū)崿F(xiàn)對語音、音頻及其混合信號的統(tǒng)一分析,并與概率分布保持量化技術(shù)相結(jié)合,搭建了一種基于概率分布保持的語音與音頻通用編碼算法,該算法以變換域編碼為基礎(chǔ),通過保持編碼前后信號間概率分布特征的一致性,實現(xiàn)了對語音和音頻信號的通用編碼。最終測試表明,所提算法對寬帶語音和音頻信號的編碼質(zhì)量,均優(yōu)于AMR-WB和ITU-T G.722.1編碼標準。
[Abstract]:With the rapid development of network communication, mobile communication and multimedia technology, the integration of different networks, systems and service platforms has become a necessity. In this trend, communication and entertainment no longer have obvious boundaries. People are not satisfied with the single one voice communication needs, and more want to enjoy the compatible voice and the voice. However, the traditional voice and audio coding, due to the limitation of the algorithm model, can not simultaneously achieve the ideal coding effect on speech, audio and its mixed signals, thus restricting the further development of mobile multimedia technology.
Based on this background, the Moving Picture Expert Group (MPEG) proposed the initiative to build a universal audio and audio encoder. A unified coding model is used to implement the universal coding of voice, audio and mixed signals, so as to overcome the disadvantages of traditional voice and audio encoders which are only suitable for single type signals. Therefore, this initiative has become a hot issue in the study of speech and audio coding. Many research institutes have participated in the research of universal coding algorithm.
In order to solve this problem, the present speech and audio coding techniques are studied in depth. From the harmonic characteristics of voice and audio signals, two common coding frameworks are proposed. At the end of the 24kbps and 32kbps code rates, the universal coding of wide-band voice and audio signals is realized.
The main achievements of this paper are as follows:
1. based on the idea of signal feature component separation, this paper builds a universal coding framework by exploring the common harmonic characteristics of speech and audio signals. The framework is free from the existing universal coding technology based on type discrimination and selection, and uses a unified model to analyze the input signal, and to keep the signal probability density before and after quantization. The consistency of the degree distribution is realized by universal coding, which effectively solves the shortcomings and shortcomings of the existing universal encoders, which are too dependent on the discrimination of signal types and the improper selection of the mixed signal quantization mechanism.
2. in this paper, the Empirical Mode Decomposition (EMD) is introduced into the field of speech and audio coding. Based on the perceptual importance and periodic characteristics of the input signal eigenmode function, a harmonic separation algorithm based on the signal character feature is proposed by using the adaptive filtering characteristics of the EMD decomposition. Harmonic components improve the accuracy of sinusoidal model parameter estimation.
3. a universal coding algorithm for sinusoidal parameters based on harmonic separation is proposed. The algorithm uses a mixed coding method to encode the different features of the input signal separately in order to give play to the different advantages of the parameter coding and transform coding, thus achieving the overall optimal of the system. The weight matching tracking algorithm is used for Sinusoidal Parameter Modeling and multi-resolution quantization coding. For non harmonic components, a jitter vector quantization method based on RE8 lattice is proposed, which makes the quantization noise as the Gauss white noise independent of the original signal, thus improving the subjective perceptual quality of the synthetic signal.
4. in order to improve the quality of the speech signal encoding the universal coding algorithm of the sinusoidal parameters, this paper combines the basic frequency synchronization analysis technique with the power spectrum keeping quantization, and proposes a speech quantization method based on the fundamental frequency synchronization. The algorithm regularization of the input signal into a regular cycle of regular pattern using the basic frequency information of the input signal. The signal is sparsely transformed by the regular periodic signal, and the modulation conversion coefficient of voiced speech is sparse by the way of energy concentration, thus the compression efficiency of the speech signal is improved.
5. based on the original basic frequency synchronization analysis algorithm, an adaptive analysis window length decision method based on energy weighted normalization and cross correlation is proposed. It can realize the unified analysis of speech, audio and its mixed signals, and combine with the probability distribution preserving quantization technique, and build a kind of speech based on probability distribution. The universal audio coding algorithm, based on the transform domain encoding, achieves universal coding for speech and audio signals by keeping the consistency of the probability distribution characteristics between the signals before and after the coding. The final test shows that the coding quality of the proposed algorithm for wideband speech and audio signals is superior to the AMR-WB and ITU-T G.722.1 coding standards.
【學位授予單位】:北京工業(yè)大學
【學位級別】:博士
【學位授予年份】:2014
【分類號】:TN912.3
【參考文獻】
相關(guān)期刊論文 前4條
1 李海婷;范睿;朱恒;劉澤新;鮑長春;賈懋s
本文編號:1922368
本文鏈接:http://sikaile.net/kejilunwen/wltx/1922368.html
最近更新
教材專著