人機(jī)交互中的聲源定位與增強(qiáng)方法研究
發(fā)布時(shí)間:2019-03-27 09:40
【摘要】:語(yǔ)音是人機(jī)交互中最自然的方式,既不需要接觸或佩戴數(shù)據(jù)設(shè)備,也不存在視覺(jué)盲點(diǎn)。在基于語(yǔ)音的人機(jī)交互系統(tǒng)中,由于噪聲的影響,,特別是交互環(huán)境中其他無(wú)關(guān)說(shuō)話人語(yǔ)音的干擾,嚴(yán)重降低了交互系統(tǒng)的性能。本文對(duì)人機(jī)交互系統(tǒng)語(yǔ)音信號(hào)信噪比的提高展開(kāi)研究。 交互目標(biāo)聲源的定位是基于麥克風(fēng)陣列的多通道語(yǔ)音增強(qiáng)法的關(guān)鍵,本文采用基于時(shí)延估計(jì)的聲源定位方法。針對(duì)信號(hào)時(shí)延估計(jì)問(wèn)題,采用先通過(guò)適當(dāng)閾值過(guò)濾噪聲再做相關(guān)處理的方式,提出一種基于閾值判決的聲達(dá)時(shí)延差估計(jì)方法。仿真實(shí)驗(yàn)表明該方法優(yōu)于廣義互相關(guān)法,為進(jìn)一步目標(biāo)聲源的空間定位提供更加準(zhǔn)確的時(shí)延參數(shù)。 為更好地模擬實(shí)際聲源所在的空間場(chǎng)景,基于麥克風(fēng)線性均勻陣列,采用雙陣列空間三維定位的方法,提出了一種由六個(gè)麥克風(fēng)構(gòu)成的平行均勻線陣接收模型。結(jié)合基于閾值判決的聲達(dá)時(shí)延差估計(jì)方法實(shí)現(xiàn)目標(biāo)聲源的三維定位。 在目標(biāo)聲源的定位基礎(chǔ)上,通過(guò)波束形成法來(lái)增強(qiáng)目標(biāo)語(yǔ)音。并對(duì)固定波束形成法中各通道的權(quán)重設(shè)置提出改進(jìn)方案,更好地實(shí)現(xiàn)目標(biāo)語(yǔ)音的增強(qiáng)。 本文通過(guò)MATLAB對(duì)所提出的算法進(jìn)行了詳細(xì)地仿真實(shí)驗(yàn),結(jié)果表明環(huán)境信噪比大于1.5dB時(shí),目標(biāo)聲源的定位精度即可達(dá)到98%以上,信噪比達(dá)到5dB左右的改善。同時(shí)算法使用的麥克風(fēng)數(shù)較少,原理簡(jiǎn)單、易于硬件實(shí)現(xiàn)。
[Abstract]:Voice is the most natural way of human-computer interaction, neither contact or wear data devices, there is no visual blind spots. In the speech-based human-computer interactive system, the performance of the interactive system is seriously reduced due to the influence of noise, especially the interference of other unrelated speakers in the interactive environment. In this paper, the improvement of signal-to-noise ratio of speech signal in man-machine interaction system is studied. The key of multi-channel speech enhancement method based on microphone array is to locate the source of interactive target. In this paper, the sound source location method based on time delay estimation is adopted. In order to solve the problem of signal time delay estimation, a method based on threshold decision is proposed to estimate the time delay of sound arrival by filtering noise by appropriate threshold and then doing correlation processing. The simulation results show that the proposed method is superior to the generalized cross-correlation method and provides more accurate time-delay parameters for further spatial localization of target sound sources. In order to better simulate the spatial scene where the actual sound source is located, a parallel uniform linear array receiving model composed of six microphones is proposed based on the McPair linear uniform array and the two-array spatial three-dimensional positioning method. Combined with threshold decision-based acoustic arrival delay estimation method, three-dimensional localization of target sound source is realized. Based on the localization of target sound source, the target speech is enhanced by beamforming method. An improved scheme is proposed to improve the weight setting of each channel in the fixed beamforming method, so that the target speech enhancement can be achieved better. In this paper, the proposed algorithm is simulated by MATLAB in detail. The results show that when the SNR of the environment is greater than 1.5dB, the positioning accuracy of target sound source can reach above 98%, and the signal-to-noise ratio of the target can be improved to about 5dB. At the same time, the algorithm uses fewer McLead numbers, simple principle, and easy to implement with hardware.
【學(xué)位授予單位】:華南理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類(lèi)號(hào)】:TN912.3
本文編號(hào):2448072
[Abstract]:Voice is the most natural way of human-computer interaction, neither contact or wear data devices, there is no visual blind spots. In the speech-based human-computer interactive system, the performance of the interactive system is seriously reduced due to the influence of noise, especially the interference of other unrelated speakers in the interactive environment. In this paper, the improvement of signal-to-noise ratio of speech signal in man-machine interaction system is studied. The key of multi-channel speech enhancement method based on microphone array is to locate the source of interactive target. In this paper, the sound source location method based on time delay estimation is adopted. In order to solve the problem of signal time delay estimation, a method based on threshold decision is proposed to estimate the time delay of sound arrival by filtering noise by appropriate threshold and then doing correlation processing. The simulation results show that the proposed method is superior to the generalized cross-correlation method and provides more accurate time-delay parameters for further spatial localization of target sound sources. In order to better simulate the spatial scene where the actual sound source is located, a parallel uniform linear array receiving model composed of six microphones is proposed based on the McPair linear uniform array and the two-array spatial three-dimensional positioning method. Combined with threshold decision-based acoustic arrival delay estimation method, three-dimensional localization of target sound source is realized. Based on the localization of target sound source, the target speech is enhanced by beamforming method. An improved scheme is proposed to improve the weight setting of each channel in the fixed beamforming method, so that the target speech enhancement can be achieved better. In this paper, the proposed algorithm is simulated by MATLAB in detail. The results show that when the SNR of the environment is greater than 1.5dB, the positioning accuracy of target sound source can reach above 98%, and the signal-to-noise ratio of the target can be improved to about 5dB. At the same time, the algorithm uses fewer McLead numbers, simple principle, and easy to implement with hardware.
【學(xué)位授予單位】:華南理工大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2014
【分類(lèi)號(hào)】:TN912.3
【參考文獻(xiàn)】
相關(guān)期刊論文 前4條
1 唐娟;行鴻彥;;基于二次相關(guān)的時(shí)延估計(jì)方法[J];計(jì)算機(jī)工程;2007年21期
2 徐義芳,張金杰,姚開(kāi)盛,曹志剛,王勇前;語(yǔ)音增強(qiáng)用于抗噪聲語(yǔ)音識(shí)別[J];清華大學(xué)學(xué)報(bào)(自然科學(xué)版);2001年01期
3 崔瑋瑋;曹志剛;魏建強(qiáng);;聲源定位中的時(shí)延估計(jì)技術(shù)[J];數(shù)據(jù)采集與處理;2007年01期
4 張佩;夏秀渝;胡連鋒;李志昌;;基于統(tǒng)計(jì)的近場(chǎng)聲源定位方法[J];通信技術(shù);2009年11期
本文編號(hào):2448072
本文鏈接:http://sikaile.net/kejilunwen/wltx/2448072.html
最近更新
教材專著