語(yǔ)音與音頻信號(hào)的通用編碼方法研究
發(fā)布時(shí)間:2018-05-22 13:56
本文選題:語(yǔ)音編碼 + 音頻編碼; 參考:《北京工業(yè)大學(xué)》2014年博士論文
【摘要】:隨著網(wǎng)絡(luò)通信、移動(dòng)通信和多媒體技術(shù)的快速發(fā)展,不同網(wǎng)絡(luò)、系統(tǒng)和服務(wù)平臺(tái)之間的相互融合已經(jīng)成為一種必然。在這一趨勢(shì)下,通信與娛樂之間已不再具有明顯的界限,人們已經(jīng)不滿足于單一的語(yǔ)音通信需求,更希望享受兼容語(yǔ)音與音頻的通信服務(wù)所帶來(lái)的愉悅。但是,傳統(tǒng)語(yǔ)音與音頻編碼由于算法模型的限制,無(wú)法同時(shí)對(duì)語(yǔ)音、音頻及其混合信號(hào)取得理想的編碼效果,從而限制了移動(dòng)多媒體技術(shù)的進(jìn)一步發(fā)展。 基于此背景,運(yùn)動(dòng)圖像專家組(Moving Picture Expert Group, MPEG)提出了構(gòu)建語(yǔ)音與音頻通用編碼器的倡議。嘗試?yán)媒y(tǒng)一的編碼模型,實(shí)現(xiàn)對(duì)語(yǔ)音、音頻及其混合信號(hào)的通用編碼,以克服傳統(tǒng)語(yǔ)音和音頻編碼器僅適合處理單一類型信號(hào)的弊端。因此,該倡議一經(jīng)提出就成為語(yǔ)音頻編碼研究的熱點(diǎn)問題,目前多家研究機(jī)構(gòu)均參與了對(duì)通用編碼算法的研究。 針對(duì)這一問題,本文對(duì)現(xiàn)有語(yǔ)音和音頻編碼技術(shù)展開深入研究,從語(yǔ)音和音頻信號(hào)共有的諧波特征出發(fā),提出了兩種通用編碼框架,并最終在24kbps和32kbps碼率下實(shí)現(xiàn)了對(duì)寬帶語(yǔ)音和音頻信號(hào)的通用編碼。 本文的主要成果體現(xiàn)為如下幾個(gè)方面: 1.本文基于信號(hào)特征成分分離的思想,通過(guò)發(fā)掘語(yǔ)音和音頻信號(hào)共有的諧波特性來(lái)搭建通用編碼框架。該框架拋開現(xiàn)有通用編碼技術(shù)基于類型判別和選擇的編碼機(jī)制,利用統(tǒng)一模型對(duì)輸入信號(hào)進(jìn)行分析,,通過(guò)保持量化前后信號(hào)概率密度分布的一致性實(shí)現(xiàn)通用編碼,有效地解決了現(xiàn)有通用編碼器過(guò)分依賴信號(hào)類型判別和對(duì)混合信號(hào)量化機(jī)制選擇不合理等缺點(diǎn)和不足; 2.本文將經(jīng)驗(yàn)?zāi)B(tài)分解算法(Empirical Mode Decomposition, EMD)引入語(yǔ)音與音頻編碼領(lǐng)域,基于輸入信號(hào)本征模態(tài)函數(shù)的感知重要性和周期性特征,利用EMD分解的自適應(yīng)濾波特性,提出了一種基于信號(hào)特征的諧波分離算法,通過(guò)提取輸入信號(hào)的諧波成分,提高了正弦模型參數(shù)估計(jì)的準(zhǔn)確性; 3.提出了一種基于諧波分離的正弦參數(shù)通用編碼算法,該算法采用混合編碼的方式對(duì)輸入信號(hào)的不同特征成分進(jìn)行分別編碼,以發(fā)揮參數(shù)編碼和變換編碼的不同優(yōu)勢(shì),從而達(dá)到系統(tǒng)的整體最優(yōu)。對(duì)于諧波成分,本文采用基于感知梯度加權(quán)的匹配追蹤算法進(jìn)行正弦參數(shù)建模和多分辨率量化編碼;對(duì)于非諧波成分,本文提出了一種基于RE8格的抖動(dòng)格型矢量量化方法,使得量化噪聲表現(xiàn)為獨(dú)立于原始信號(hào)的高斯白噪聲,從而提升了合成信號(hào)的主觀感知質(zhì)量; 4.為了提升所提正弦參數(shù)通用編碼算法對(duì)語(yǔ)音信號(hào)的編碼質(zhì)量,本文將基頻同步分析技術(shù)與功率譜保持量化相結(jié)合,提出了一種基于基頻同步的語(yǔ)音量化方法。該算法利用輸入信號(hào)的基頻信息,將輸入信號(hào)規(guī)整為具有固定周期的規(guī)整信號(hào),并對(duì)規(guī)整后的周期信號(hào)進(jìn)行稀疏變換,通過(guò)能量集中的方式實(shí)現(xiàn)對(duì)濁音語(yǔ)音調(diào)制變換系數(shù)的稀疏化,從而提升了編碼器對(duì)語(yǔ)音信號(hào)的壓縮效率; 5.在原有基頻同步分析算法基礎(chǔ)上,提出了一種基于能量加權(quán)歸一化互相關(guān)的自適應(yīng)分析窗長(zhǎng)判決方法,使其能夠?qū)崿F(xiàn)對(duì)語(yǔ)音、音頻及其混合信號(hào)的統(tǒng)一分析,并與概率分布保持量化技術(shù)相結(jié)合,搭建了一種基于概率分布保持的語(yǔ)音與音頻通用編碼算法,該算法以變換域編碼為基礎(chǔ),通過(guò)保持編碼前后信號(hào)間概率分布特征的一致性,實(shí)現(xiàn)了對(duì)語(yǔ)音和音頻信號(hào)的通用編碼。最終測(cè)試表明,所提算法對(duì)寬帶語(yǔ)音和音頻信號(hào)的編碼質(zhì)量,均優(yōu)于AMR-WB和ITU-T G.722.1編碼標(biāo)準(zhǔn)。
[Abstract]:With the rapid development of network communication, mobile communication and multimedia technology, the integration of different networks, systems and service platforms has become a necessity. In this trend, communication and entertainment no longer have obvious boundaries. People are not satisfied with the single one voice communication needs, and more want to enjoy the compatible voice and the voice. However, the traditional voice and audio coding, due to the limitation of the algorithm model, can not simultaneously achieve the ideal coding effect on speech, audio and its mixed signals, thus restricting the further development of mobile multimedia technology.
Based on this background, the Moving Picture Expert Group (MPEG) proposed the initiative to build a universal audio and audio encoder. A unified coding model is used to implement the universal coding of voice, audio and mixed signals, so as to overcome the disadvantages of traditional voice and audio encoders which are only suitable for single type signals. Therefore, this initiative has become a hot issue in the study of speech and audio coding. Many research institutes have participated in the research of universal coding algorithm.
In order to solve this problem, the present speech and audio coding techniques are studied in depth. From the harmonic characteristics of voice and audio signals, two common coding frameworks are proposed. At the end of the 24kbps and 32kbps code rates, the universal coding of wide-band voice and audio signals is realized.
The main achievements of this paper are as follows:
1. based on the idea of signal feature component separation, this paper builds a universal coding framework by exploring the common harmonic characteristics of speech and audio signals. The framework is free from the existing universal coding technology based on type discrimination and selection, and uses a unified model to analyze the input signal, and to keep the signal probability density before and after quantization. The consistency of the degree distribution is realized by universal coding, which effectively solves the shortcomings and shortcomings of the existing universal encoders, which are too dependent on the discrimination of signal types and the improper selection of the mixed signal quantization mechanism.
2. in this paper, the Empirical Mode Decomposition (EMD) is introduced into the field of speech and audio coding. Based on the perceptual importance and periodic characteristics of the input signal eigenmode function, a harmonic separation algorithm based on the signal character feature is proposed by using the adaptive filtering characteristics of the EMD decomposition. Harmonic components improve the accuracy of sinusoidal model parameter estimation.
3. a universal coding algorithm for sinusoidal parameters based on harmonic separation is proposed. The algorithm uses a mixed coding method to encode the different features of the input signal separately in order to give play to the different advantages of the parameter coding and transform coding, thus achieving the overall optimal of the system. The weight matching tracking algorithm is used for Sinusoidal Parameter Modeling and multi-resolution quantization coding. For non harmonic components, a jitter vector quantization method based on RE8 lattice is proposed, which makes the quantization noise as the Gauss white noise independent of the original signal, thus improving the subjective perceptual quality of the synthetic signal.
4. in order to improve the quality of the speech signal encoding the universal coding algorithm of the sinusoidal parameters, this paper combines the basic frequency synchronization analysis technique with the power spectrum keeping quantization, and proposes a speech quantization method based on the fundamental frequency synchronization. The algorithm regularization of the input signal into a regular cycle of regular pattern using the basic frequency information of the input signal. The signal is sparsely transformed by the regular periodic signal, and the modulation conversion coefficient of voiced speech is sparse by the way of energy concentration, thus the compression efficiency of the speech signal is improved.
5. based on the original basic frequency synchronization analysis algorithm, an adaptive analysis window length decision method based on energy weighted normalization and cross correlation is proposed. It can realize the unified analysis of speech, audio and its mixed signals, and combine with the probability distribution preserving quantization technique, and build a kind of speech based on probability distribution. The universal audio coding algorithm, based on the transform domain encoding, achieves universal coding for speech and audio signals by keeping the consistency of the probability distribution characteristics between the signals before and after the coding. The final test shows that the coding quality of the proposed algorithm for wideband speech and audio signals is superior to the AMR-WB and ITU-T G.722.1 coding standards.
【學(xué)位授予單位】:北京工業(yè)大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2014
【分類號(hào)】:TN912.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前4條
1 李海婷;范睿;朱恒;劉澤新;鮑長(zhǎng)春;賈懋s
本文編號(hào):1922368
本文鏈接:http://sikaile.net/kejilunwen/wltx/1922368.html
最近更新
教材專著