語(yǔ)音信號(hào)壓縮感知關(guān)鍵技術(shù)研究
本文選題:語(yǔ)音信號(hào) + 壓縮感知; 參考:《南京郵電大學(xué)》2014年博士論文
【摘要】:信號(hào)的稀疏性是壓縮感知理論的應(yīng)用前提,壓縮感知用最少的觀測(cè)數(shù)來對(duì)信號(hào)進(jìn)行壓縮采樣,實(shí)現(xiàn)了信號(hào)的降維處理,節(jié)約了采樣和傳輸?shù)某杀,給信號(hào)采樣技術(shù)帶來一場(chǎng)新的革命。對(duì)于語(yǔ)音信號(hào)而言,由于其具有近似稀疏性,可以將壓縮感知理論與語(yǔ)音信號(hào)處理技術(shù)結(jié)合,打破了傳統(tǒng)的建立于奈奎斯特采樣的語(yǔ)音信號(hào)處理經(jīng)典模式。用壓縮感知理論中的觀測(cè)序列來代替?zhèn)鹘y(tǒng)奈奎斯特語(yǔ)音采樣值,將導(dǎo)致信號(hào)特征發(fā)生根本性的變化,從而影響語(yǔ)音信號(hào)處理應(yīng)用的各個(gè)領(lǐng)域。本課題在對(duì)壓縮感知理論深入研究的基礎(chǔ)上,研究了語(yǔ)音信號(hào)的壓縮感知稀疏域和基于觀測(cè)序列的語(yǔ)音端點(diǎn)檢測(cè)算法,提出一種適合語(yǔ)音的觀測(cè)矩陣,并對(duì)該觀測(cè)矩陣投影下的觀測(cè)序列模型進(jìn)行研究,針對(duì)語(yǔ)音壓縮感知,提出一種碼本映射聯(lián)合l1重構(gòu)算法。論文的主要工作和創(chuàng)新如下: (1)研究語(yǔ)音觀測(cè)序列在不同稀疏域下的壓縮感知重構(gòu)技術(shù),對(duì)比語(yǔ)音信號(hào)在DCT、DFT、DWT及K-L變換下的稀疏性。研究表明,雖然在K-L變換下語(yǔ)音系數(shù)是最稀疏的,但由于重構(gòu)時(shí)需要用到原信號(hào)的自相關(guān)矩陣,實(shí)際應(yīng)用困難,而在前三種稀疏域下,DCT變換的稀疏性最好。研究了在隨機(jī)高斯矩陣投影下,壓縮感知BP重構(gòu)和OMP重構(gòu)的原理及性能。實(shí)驗(yàn)結(jié)果顯示對(duì)語(yǔ)音信號(hào)而言,在相同觀測(cè)點(diǎn)數(shù)下,BP重構(gòu)性能優(yōu)于OMP,但運(yùn)算復(fù)雜度大。研究了語(yǔ)音觀測(cè)在過完備余弦字典及KSVD字典下的壓縮感知,由于系數(shù)稀疏性的增強(qiáng),其重構(gòu)效果比DCT基均有提高,且KSVD字典重構(gòu)性能優(yōu)于過完備余弦字典。根據(jù)語(yǔ)音幀和非語(yǔ)音幀壓縮感知觀測(cè)序列頻譜幅度分布分散且差異較大的特性,提出一種基于壓縮感知觀測(cè)序列倒譜距離的語(yǔ)音端點(diǎn)檢測(cè)算法,以直接根據(jù)觀測(cè)序列特性分析判斷出原始輸入語(yǔ)音的屬性。對(duì)不同信噪比下的帶噪語(yǔ)音進(jìn)行端點(diǎn)檢測(cè)仿真實(shí)驗(yàn),其性能與傳統(tǒng)奈奎斯特采樣下的倒譜端點(diǎn)檢測(cè)相當(dāng),但可以降低運(yùn)算量。 (2)針對(duì)DCT稀疏基下,語(yǔ)音信號(hào)采用隨機(jī)高斯觀測(cè)矩陣投影時(shí),壓縮感知重構(gòu)零(近似零)系數(shù)定位能力差,導(dǎo)致對(duì)重構(gòu)質(zhì)量起主導(dǎo)作用的系數(shù)樣值發(fā)生較大誤差的問題,提出一種適合于語(yǔ)音信號(hào)壓縮采樣的行階梯觀測(cè)矩陣,并對(duì)壓縮觀測(cè)序列采用對(duì)偶仿射尺度內(nèi)點(diǎn)算法進(jìn)行重構(gòu)。仿真實(shí)驗(yàn)結(jié)果顯示,行階梯矩陣做觀測(cè)矩陣,能夠?qū)φZ(yǔ)音信號(hào)的零(近似零)系數(shù)進(jìn)行較好的定位,從而得到明顯優(yōu)于高斯觀測(cè)矩陣下語(yǔ)音壓縮感知的重構(gòu)性能,并且行階梯觀測(cè)矩陣與隨機(jī)高斯觀測(cè)矩陣相比,相應(yīng)的數(shù)據(jù)量和運(yùn)算量都大大減小。因此,作者認(rèn)為,行階梯觀測(cè)矩陣是適合語(yǔ)音信號(hào)壓縮感知采樣的比較理想的投影矩陣。 (3)鑒于行階梯矩陣投影下得到的語(yǔ)音壓縮觀測(cè)序列仍具有較強(qiáng)的相關(guān)性,提出對(duì)觀測(cè)序列采用Volterra級(jí)數(shù)二次建模,分析輸入序列維數(shù)和模型階數(shù)對(duì)語(yǔ)音行階梯觀測(cè)序列預(yù)測(cè)的效果,并聯(lián)合使用Wiener濾波器以提高預(yù)測(cè)準(zhǔn)確程度,實(shí)現(xiàn)了基于部分CS觀測(cè)序列、Volterra模型、Wiener濾波器的CS重構(gòu)。 (4)論文最后針對(duì)CS重構(gòu)算法計(jì)算量大的問題,,提出一種基于觀測(cè)序列與原始序列關(guān)系的碼本映射重構(gòu)方法,該方法與l1重構(gòu)相比,對(duì)稀疏系數(shù)的位置估計(jì)較為準(zhǔn)確,且不需要優(yōu)化算法進(jìn)行重構(gòu),而是從訓(xùn)練得到的碼本中直接得到重構(gòu)系數(shù),重構(gòu)時(shí)需要的計(jì)算量比BP和OMP算法明顯下降。但由于系數(shù)大小估計(jì)不夠準(zhǔn)確,綜合考慮重構(gòu)性能和運(yùn)算量,采用碼本映射聯(lián)合l1重構(gòu)。該算法訓(xùn)練階段得到語(yǔ)音碼本和觀測(cè)碼本,測(cè)試階段先估計(jì)測(cè)試語(yǔ)音的SNR,然后根據(jù)SNR和CS壓縮比選擇相應(yīng)的能量門限,觀測(cè)序列幀能量大于采用l1重構(gòu),小于l1采用碼本重構(gòu)。實(shí)驗(yàn)表明,在中低SNR環(huán)境下,碼本映射聯(lián)合l1重構(gòu)算法在一定的能量門限下重構(gòu)性能優(yōu)于l1重構(gòu),在高SNR和無噪環(huán)境下,碼本映射聯(lián)合l1算法在碼本幀數(shù)為總幀數(shù)3/10左右時(shí),可獲得與l1重構(gòu)相當(dāng)?shù)男阅。?lián)合算法中碼本重構(gòu)部分由于不需要計(jì)算量很大的非線性優(yōu)化算法,能夠節(jié)省相應(yīng)的運(yùn)算量。
[Abstract]:The sparsity of the signal is the application premise of the compression perception theory. Compressed sensing uses the least observation number to compress the signal, realizes the signal reduction processing, saves the cost of sampling and transmission, and brings a new revolution to the signal sampling technology. For the speech signal, because of its approximate sparsity, it can press the pressure. The combination of contraction sensing theory and speech signal processing technology breaks the classic model of speech signal processing established in Nyquist sampling. Using the observation sequence in the compressed sensing theory to replace the traditional Nyquist voice sampling value, it will lead to the fundamental change of signal characteristics, thus affecting the application of speech signal processing. On the basis of deep research on the theory of compressed sensing, this paper studies the compressed sensing sparse domain of speech signal and the algorithm of speech endpoint detection based on observation sequence, proposes an observation matrix suitable for speech, and studies the observation sequence model under the projection of the observation matrix, and proposes the speech compression perception. A codebook mapping combined with L1 reconstruction algorithm. The main work and innovations of the paper are as follows:
(1) study the compression sensing reconstruction techniques of speech observation sequences under different sparse domains, compare the sparsity of speech signals under DCT, DFT, DWT and K-L transform. The study shows that although the speech coefficients are the thinnest in the K-L transformation, the autocorrelation matrix of the original signal is used in the reconstruction, and the actual application is difficult, and the first three sparse domains are used. The sparsity of DCT transformation is the best. The principle and performance of the compressed sensing BP reconstruction and OMP reconstruction under the random Gauss matrix projection are studied. The experimental results show that the performance of the BP reconstruction is better than that of OMP for the same observation points, but the computational complexity is large. The speech observation is studied under the overcomplete cosine dictionary and the KSVD dictionary. Because of the enhancement of coefficient sparsity, the reconstruction effect of the compressed sensing is better than that of the DCT base, and the performance of the KSVD dictionary is better than the overcomplete cosine dictionary. According to the characteristics of the spectral amplitude distribution of the speech frame and the non speech frame compression perceptual observation sequence, a kind of language based on the cepstrum distance of the compressed sensing observation sequence is proposed. The speech endpoint detection algorithm is used to determine the attributes of the original input speech directly according to the characteristics of the observation sequence. The simulation experiment on the endpoint detection of the noisy speech under different signal to noise ratio is simulated. Its performance is equivalent to the inverse spectrum endpoint detection under the traditional Nyquist sampling, but it can reduce the computation.
(2) under the DCT sparse basis, when the speech signal is projected by random Gauss observation matrix, the ability of the compressed sensing to reconstruct the zero (approximate zero) coefficient is poor, which leads to the large error of the coefficient sample which plays a leading role in the reconstruction quality, and proposes a row step observation matrix suitable for the compression sampling of the speech signal and the compression observation. The sequence is reconstructed by the dual affine scale interior point algorithm. The simulation results show that the row step matrix is an observation matrix, which can better locate the zero (approximate zero) coefficients of the speech signal, and the reconstruction performance of the speech compression perception under the Gauss observation matrix is obviously better than that of the step observation matrix and the random Gauss. Compared with the observation matrix, the corresponding amount of data and the amount of operation are greatly reduced. Therefore, the author thinks that the row step observation matrix is an ideal projection matrix suitable for the perceptual sampling of speech signal compression.
(3) in view of the strong correlation of the speech compression observation sequence obtained under the row ladder matrix projection, the two time modeling of Volterra series is adopted for the observation sequence, and the effect of the dimension of the input sequence and the model order to the prediction of the speech step observation sequence is analyzed, and the Wiener filter is combined to improve the accuracy of the prediction. The CS reconstruction is based on partial CS observation sequence, Volterra model and Wiener filter.
(4) at the end of the paper, a new method of codebook mapping reconstruction based on the relationship between the observation sequence and the original sequence is proposed, which is based on the relationship between the observation sequence and the original sequence. Compared with the L1 reconstruction, this method is more accurate for the position estimation of the sparse coefficient and does not need to be reconstructed by the optimization algorithm, but the reconfiguration system is directly obtained from the codebook trained by the CS. Number, the amount of computation needed in reconfiguration is significantly lower than that of BP and OMP algorithm. But because the estimation of the coefficient is not accurate enough, the reconfiguration performance and computation are considered synthetically, the codebook mapping combined with L1 is used. The training phase of the algorithm gets the phonetic codebook and the observational codebook. The test phase first estimates the SNR of the test speech, and then according to the SNR and CS compression ratio selection. According to the corresponding energy threshold, the frame energy of the observation sequence is larger than the L1 reconfiguration, and the L1 is less than the codebook reconstruction. The experiment shows that under the low SNR environment, the codebook mapping combined with L1 reconstruction algorithm is better than the L1 reconstruction under certain energy threshold. Under the high SNR and noise free environment, the codebook mapping combined with L1 algorithm is the total frame number 3/1 in the codebook frame number. At about 0, the performance of the L1 reconfiguration can be obtained. The codebook reconfiguration part of the joint algorithm can save the corresponding computation due to the nonlinear optimization algorithm which does not require a large amount of computation.
【學(xué)位授予單位】:南京郵電大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2014
【分類號(hào)】:TN912.3
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 周開利;基于子波變換的語(yǔ)音信號(hào)壓縮[J];海南大學(xué)學(xué)報(bào)(自然科學(xué)版);2002年02期
2 胡峻輝,王蓓蕾,李晶皎;基于凌陽(yáng)單片機(jī)的語(yǔ)音信號(hào)實(shí)時(shí)采集[J];單片機(jī)與嵌入式系統(tǒng)應(yīng)用;2003年04期
3 田麗平;;基于混沌復(fù)合映射的語(yǔ)音信號(hào)流安全通信仿真實(shí)現(xiàn)[J];計(jì)算機(jī)與現(xiàn)代化;2007年02期
4 張達(dá)敏;;小波包分析在語(yǔ)音信號(hào)壓縮中的應(yīng)用[J];現(xiàn)代機(jī)械;2007年06期
5 徐光潔;高清維;許亞男;;微振動(dòng)語(yǔ)音信號(hào)檢測(cè)的干擾背景分析研究[J];計(jì)算機(jī)與數(shù)字工程;2009年03期
6 呂釗;吳小培;張超;李密;;卷積噪聲環(huán)境下語(yǔ)音信號(hào)魯棒特征提取[J];聲學(xué)學(xué)報(bào);2010年04期
7 韓麗娟;;混沌背景下語(yǔ)音信號(hào)提取算法的研究[J];電子技術(shù);2010年05期
8 高悅;王改梅;陳硯圃;閔剛;杜佳;;基于差分變換的語(yǔ)音信號(hào)壓縮感知[J];信號(hào)處理;2011年09期
9 徐倩;季云云;;基于最優(yōu)觀測(cè)的語(yǔ)音信號(hào)壓縮感知[J];南京郵電大學(xué)學(xué)報(bào)(自然科學(xué)版);2011年06期
10 劉毅強(qiáng);劉昱;段繼忠;劉亞峰;;壓縮感知處理語(yǔ)音信號(hào)的性能分析及比較[J];電聲技術(shù);2012年02期
相關(guān)會(huì)議論文 前10條
1 趙力;曾毓敏;鄒采榮;吳鎮(zhèn)揚(yáng);;基于子空間分析的語(yǔ)音信號(hào)寂聲語(yǔ)聲段識(shí)別方法[A];第十屆全國(guó)信號(hào)處理學(xué)術(shù)年會(huì)(CCSP-2001)論文集[C];2001年
2 杜安麗;王茜;余磊;孫洪;;基于小波樹結(jié)構(gòu)的語(yǔ)音信號(hào)壓縮感知恢復(fù)算法[A];2010年通信理論與信號(hào)處理學(xué)術(shù)年會(huì)論文集[C];2010年
3 張?jiān)埔?崔杰;肖靈;;一種改進(jìn)的語(yǔ)音信號(hào)去混響算法[A];泛在信息社會(huì)中的聲學(xué)——中國(guó)聲學(xué)學(xué)會(huì)2010年全國(guó)會(huì)員代表大會(huì)暨學(xué)術(shù)會(huì)議論文集[C];2010年
4 陳韜;莫福源;李昌立;;語(yǔ)音信號(hào)的自動(dòng)分段方法研究[A];第三屆全國(guó)人機(jī)語(yǔ)音通訊學(xué)術(shù)會(huì)議論文集[C];1994年
5 沙宗先;盧緒剛;秦兵;李吉民;;語(yǔ)音信號(hào)的混沌現(xiàn)象研究[A];第四屆全國(guó)人機(jī)語(yǔ)音通訊學(xué)術(shù)會(huì)議論文集[C];1996年
6 沙宗先;韓俊濤;陳惠鵬;秦兵;;語(yǔ)音信號(hào)的混沌現(xiàn)象研究[A];第五屆全國(guó)人機(jī)語(yǔ)音通訊學(xué)術(shù)會(huì)議論文集[C];1998年
7 劉佳;師碩;李錫杰;王旭;;語(yǔ)音信號(hào)的分析方法和應(yīng)用[A];第八屆全國(guó)人機(jī)語(yǔ)音通訊學(xué)術(shù)會(huì)議論文集[C];2005年
8 于水源;陳玉東;;語(yǔ)音信號(hào)非線性動(dòng)力學(xué)特性與語(yǔ)音學(xué)特性之間的關(guān)系[A];中國(guó)聲學(xué)學(xué)會(huì)2006年全國(guó)聲學(xué)學(xué)術(shù)會(huì)議論文集[C];2006年
9 呂苗榮;古德生;彭振斌;;語(yǔ)音信號(hào)基本處理單元的選擇與應(yīng)用[A];2007通信理論與技術(shù)新發(fā)展——第十二屆全國(guó)青年通信學(xué)術(shù)會(huì)議論文集(上冊(cè))[C];2007年
10 高暢;李海峰;馬琳;;基于壓縮感知理論的語(yǔ)音信號(hào)壓縮與重構(gòu)方法[A];第十一屆全國(guó)人機(jī)語(yǔ)音通訊學(xué)術(shù)會(huì)議論文集(一)[C];2011年
相關(guān)重要報(bào)紙文章 前9條
1 西安郵電學(xué)院 王娜;企業(yè)IP電話解決方案探討[N];通信信息報(bào);2005年
2 成都 史為;紅外光語(yǔ)音通信實(shí)驗(yàn)[N];電子報(bào);2005年
3 記者 楊柳純;HYT攜手清華大學(xué)研發(fā)語(yǔ)音信號(hào)技術(shù)[N];深圳特區(qū)報(bào);2009年
4 ;什么是信號(hào)分離器?[N];中國(guó)電腦教育報(bào);2003年
5 NMS國(guó)際公司供稿;StudioSound:高性能的語(yǔ)音質(zhì)量[N];通信產(chǎn)業(yè)報(bào);2003年
6 陜西 朱亞偉 編譯;一款半雙工對(duì)講機(jī)電路[N];電子報(bào);2012年
7 湖海;美推出一次性手機(jī)[N];中國(guó)電子報(bào);2002年
8 ;YS-608型學(xué)習(xí)耳機(jī)原理與維修[N];電子報(bào);2002年
9 山東 呂建國(guó);鸚鵡學(xué)話、復(fù)讀兩用電路[N];電子報(bào);2002年
相關(guān)博士學(xué)位論文 前10條
1 薛麗芳;語(yǔ)音信號(hào)動(dòng)態(tài)特征分析及其可視化的關(guān)鍵技術(shù)研究[D];東北大學(xué) ;2010年
2 韓志艷;語(yǔ)音信號(hào)魯棒特征提取及可視化技術(shù)研究[D];東北大學(xué);2009年
3 劉柏森;基于HHT復(fù)雜環(huán)境下低信噪比語(yǔ)音檢測(cè)及增強(qiáng)方法研究[D];哈爾濱工程大學(xué);2011年
4 葉蕾;語(yǔ)音信號(hào)壓縮感知關(guān)鍵技術(shù)研究[D];南京郵電大學(xué);2014年
5 金學(xué)成;基于語(yǔ)音信號(hào)的情感識(shí)別研究[D];中國(guó)科學(xué)技術(shù)大學(xué);2007年
6 陳為國(guó);實(shí)時(shí)語(yǔ)音信號(hào)處理系統(tǒng)理論和應(yīng)用[D];浙江大學(xué);2004年
7 譚麗麗;語(yǔ)音信號(hào)盲分離算法的研究[D];華南理工大學(xué);2001年
8 閆潤(rùn)強(qiáng);語(yǔ)音信號(hào)動(dòng)力學(xué)特性遞歸分析[D];上海交通大學(xué);2006年
9 覃愛娜;基于非線性理論的漢語(yǔ)語(yǔ)音編碼技術(shù)研究[D];中南大學(xué);2012年
10 郭海燕;基于稀疏分解的單通道混合語(yǔ)音分離算法研究[D];南京郵電大學(xué);2011年
相關(guān)碩士學(xué)位論文 前10條
1 蘇秦;基于聲場(chǎng)景分析的混疊語(yǔ)音信號(hào)分離[D];蘇州大學(xué);2004年
2 牛國(guó)君;神經(jīng)網(wǎng)絡(luò)方法在語(yǔ)音信號(hào)檢測(cè)中應(yīng)用的研究[D];西南交通大學(xué);2003年
3 張健;基于壓縮感知的語(yǔ)音信號(hào)建模技術(shù)的研究[D];南京郵電大學(xué);2012年
4 宋楊潔;基于LabVIEW與MATLAB的語(yǔ)言信號(hào)的采集與分析[D];武漢理工大學(xué);2012年
5 趙翠;基于壓縮感知的語(yǔ)音信號(hào)壓縮[D];浙江工業(yè)大學(xué);2013年
6 高靜;壓埋人員呼救語(yǔ)音信號(hào)處理方法研究[D];成都理工大學(xué);2013年
7 呂麗鵬;基于時(shí)頻分析的語(yǔ)音信號(hào)多脊提取算法研究[D];五邑大學(xué);2013年
8 王帥;基于壓縮感知的語(yǔ)音信號(hào)壓縮重構(gòu)算法研究[D];中北大學(xué);2014年
9 李智海;基于語(yǔ)音信號(hào)監(jiān)測(cè)腦疲勞的微電子系統(tǒng)設(shè)計(jì)與優(yōu)化[D];蘇州大學(xué);2011年
10 郭海燕;基于小波變換的語(yǔ)音信號(hào)增強(qiáng)研究[D];燕山大學(xué);2012年
本文編號(hào):1948685
本文鏈接:http://sikaile.net/kejilunwen/wltx/1948685.html