基于MFCC字典和SL0算法的語音壓縮感知研究
發(fā)布時間:2018-05-08 04:15
本文選題:語音信號 + 壓縮感知 ; 參考:《南京郵電大學(xué)》2017年碩士論文
【摘要】:壓縮感知理論框架下,采樣率低于傳統(tǒng)奈奎斯特采樣定理,實現(xiàn)了壓縮和采樣的同步進行,同時用較少的觀測值實現(xiàn)信號高質(zhì)量重構(gòu)。語音信號在頻域和離散余弦變換域等都具有良好的稀疏特性,滿足壓縮感知的先驗條件,因此可以基于壓縮感知對語音信號進行處理。將壓縮感知應(yīng)用于語音信號處理,這對信號的采樣、存儲、傳輸?shù)榷紟砹撕艽蠓奖恪嚎s感知理論用到語音信號中來探求語音處理的新方法具有很好的理論意義和實際價值。本文的研究目標是設(shè)計并優(yōu)化語音壓縮感知的稀疏分解基和重構(gòu)算法,使得語音信號獲得更高的重構(gòu)質(zhì)量、減少重構(gòu)時間,奠定語音壓縮感知在實際應(yīng)用中的理論基礎(chǔ)。論文主要對語音壓縮感知中稀疏表示和重構(gòu)算法部分進行了研究:提出了基于語音MFCC參數(shù)的過完備字典,提出了一種基于平滑L0算法的語音壓縮重構(gòu)模型,豐富了語音壓縮感知理論。本文主要研究內(nèi)容和創(chuàng)新成果包括:(1)介紹了壓縮感知與傳統(tǒng)奈奎斯特采樣的區(qū)別與聯(lián)系,分析了壓縮感知的理論框架。詳述了語音處理中的壓縮感知應(yīng)用,包括語音壓縮感知中常用稀疏基、觀測矩陣和重構(gòu)算法。并實驗驗證了語音信號在DCT基、過完備DCT字典、K-SVD字典下的稀疏性,對比了語音信號基于不同稀疏基、觀測矩陣和重構(gòu)算法時的重構(gòu)效果。實驗結(jié)果表明,語音壓縮感知中稀疏基、觀測矩陣和重構(gòu)算法的選取、以及語音幀長、壓縮比對語音信號重構(gòu)都會產(chǎn)生影響。(2)提出了基于語音MFCC參數(shù)的過完備字典構(gòu)造方法。介紹了語音信號MFCC參數(shù)的提取過程,以及基于過完備MFCC字典的語音壓縮感知的實現(xiàn)過程。實驗證明了語音信號在過完備MFCC字典上具有稀疏性;在相同的訓(xùn)練語音數(shù)目和字典規(guī)模的情況下,相比于傳統(tǒng)的K-SVD字典,過完備MFCC字典訓(xùn)練時間大大減少,使得字典訓(xùn)練更容易實現(xiàn)。這種優(yōu)勢在語料比較多的情況下更為明顯。過完備MFCC字典應(yīng)用于語音壓縮感知中是可行的并具有重要意義。(3)提出了一種基于平滑L0算法的語音壓縮重構(gòu)模型。平滑L0算法是用平滑函數(shù)逼近L0范數(shù),它不需要提前知道信號的稀疏度,具有計算量低、重構(gòu)質(zhì)量高等優(yōu)點。此外,提出了一種新的平滑函數(shù),并基于高斯函數(shù)和新的平滑函數(shù)來驗證平滑L0算法在語音壓縮重構(gòu)中的優(yōu)越性。實驗結(jié)果證明,基于兩種平滑函數(shù)的SL0算法,對語音信號進行重構(gòu)時,性能均優(yōu)于傳統(tǒng)常用的OMP算法、BP算法等。并且,在壓縮比高于0.4時,基于新的平滑函數(shù)的SL0重構(gòu)模型的語音重構(gòu)質(zhì)量要高于使用標準高斯函數(shù)的SL0重構(gòu)模型。
[Abstract]:In the frame of compressed sensing theory, the sampling rate is lower than that of the traditional Nyquist sampling theorem, which realizes the synchronization of compression and sampling, and the reconstruction of high quality signal with less observations. Speech signals have good sparseness in frequency domain and discrete cosine transform domain, which satisfy the prior condition of compression perception, so speech signals can be processed based on compression perception. Compression sensing is applied to speech signal processing, which brings great convenience to signal sampling, storage, transmission and so on. It is of great theoretical significance and practical value to apply the theory of compression perception to the speech signal to explore the new method of speech processing. The research goal of this paper is to design and optimize the sparse decomposition basis and reconstruction algorithm of speech compression perception, so that the speech signal can obtain higher reconstruction quality, reduce the reconstruction time, and lay the theoretical foundation of speech compression perception in practical application. In this paper, the sparse representation and reconstruction algorithm in speech compression perception is studied. An overcomplete dictionary based on speech MFCC parameters is proposed, and a speech compression and reconstruction model based on smooth L0 algorithm is proposed. It enriches the theory of speech compression perception. In this paper, the main research contents and innovative achievements include: (1) introduce the difference and relation between compressed sensing and traditional Nyquist sampling, and analyze the theoretical framework of compressed sensing. The application of compression sensing in speech processing is described in detail, including sparse basis, observation matrix and reconstruction algorithm. The sparsity of speech signal in DCT basis and over complete DCT dictionary K-SVD dictionary is verified experimentally. The reconstruction effect of speech signal based on different sparse basis, observation matrix and reconstruction algorithm is compared. The experimental results show that the sparse basis in speech compression perception, the selection of observation matrix and reconstruction algorithm, and the influence of speech frame length and compression ratio on speech signal reconstruction are all affected. (2) an overcomplete dictionary construction method based on speech MFCC parameters is proposed. The extraction process of speech signal MFCC parameters and the realization of speech compression perception based on overcomplete MFCC dictionary are introduced. The experiment proves that the speech signal is sparse in the over-complete MFCC dictionary, and the training time of the over-complete MFCC dictionary is greatly reduced compared with the traditional K-SVD dictionary with the same number of trained speech and the same size of the dictionary. Make dictionary training easier to implement. This advantage is more obvious in the case of more data. It is feasible and significant to apply overcomplete MFCC dictionary to speech compression perception. (3) A speech compression reconstruction model based on smooth L0 algorithm is proposed. Smoothing L0 algorithm approximates L0 norm by smoothing function. It does not need to know the sparse degree of signal in advance and has the advantages of low computation and high reconstruction quality. In addition, a new smoothing function is proposed, and the superiority of smooth L0 algorithm in speech compression reconstruction is verified based on Gao Si function and new smoothing function. The experimental results show that the performance of the SL0 algorithm based on two smoothing functions is better than that of the traditional OMP algorithm. Moreover, when the compression ratio is higher than 0.4, the SL0 reconstruction model based on the new smoothing function has better speech reconstruction quality than the SL0 reconstruction model using the standard Gao Si function.
【學(xué)位授予單位】:南京郵電大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TN912.3
【參考文獻】
相關(guān)期刊論文 前10條
1 劉芳;武嬌;楊淑媛;焦李成;;結(jié)構(gòu)化壓縮感知研究進展[J];自動化學(xué)報;2013年12期
2 羅孟儒;周四望;;自適應(yīng)小波包圖像壓縮感知方法[J];電子與信息學(xué)報;2013年10期
3 李s,
本文編號:1859916
本文鏈接:http://sikaile.net/kejilunwen/xinxigongchenglunwen/1859916.html
最近更新
教材專著