基于機(jī)器學(xué)習(xí)的雙麥克風(fēng)手機(jī)語(yǔ)音增強(qiáng)算法研究

發(fā)布時(shí)間：2018-06-15 19:27

本文選題：神經(jīng)網(wǎng)絡(luò) + 手機(jī)��；參考：《南京師范大學(xué)》2017年博士論文

【摘要】：手機(jī)作為目前市場(chǎng)最大,消費(fèi)人群最廣的便攜式移動(dòng)通訊設(shè)備,其通話(huà)質(zhì)量的改善一直以來(lái)受到了廣泛的關(guān)注。由于使用場(chǎng)合很廣,需要應(yīng)對(duì)的背景噪聲環(huán)境也十分復(fù)雜,這就要求應(yīng)用于手機(jī)平臺(tái)上的消噪算法可以靈活地應(yīng)對(duì)多種噪聲,在保證語(yǔ)音通話(huà)質(zhì)量的前提下,對(duì)背景噪聲進(jìn)行有效抑制,而且算法的性能不會(huì)因使用者握機(jī)姿勢(shì)的不同或通話(huà)過(guò)程中手機(jī)的轉(zhuǎn)動(dòng)而下降,對(duì)真實(shí)環(huán)境具有良好的魯棒性。近年來(lái)人工智能的應(yīng)用已逐步覆蓋各個(gè)領(lǐng)域,機(jī)器學(xué)習(xí)作為其核心,強(qiáng)調(diào)在不斷的數(shù)據(jù)學(xué)習(xí)中改善算法的性能,這種特性使得機(jī)器學(xué)習(xí)相關(guān)算法(如神經(jīng)網(wǎng)絡(luò))能夠靈活應(yīng)對(duì)復(fù)雜而多變的外部環(huán)境,如果能將機(jī)器學(xué)習(xí)應(yīng)用于手機(jī)消噪算法中一定會(huì)顯著提升算法在真實(shí)場(chǎng)景下的性能,然而相關(guān)研究工作卻并不多。本文嘗試將機(jī)器學(xué)習(xí)中的神經(jīng)網(wǎng)絡(luò)模型應(yīng)用于手機(jī)消噪算法中,并針對(duì)消噪算法的各個(gè)部分進(jìn)行了改進(jìn),提高了算法在真實(shí)使用場(chǎng)景下的靈活性和魯棒性。全文工作及創(chuàng)新點(diǎn)主要包含下列幾個(gè)方面:(1)針對(duì)現(xiàn)有的雙通道VAD算法依賴(lài)于固定閾值難以在多種不同的噪聲環(huán)境下準(zhǔn)確地檢測(cè)語(yǔ)音和噪聲等問(wèn)題。論文第二章結(jié)合神經(jīng)網(wǎng)絡(luò)提出了一種新的雙通道VAD算法,該算法以分頻帶能量差和歸一化互通道相關(guān)作為兩類(lèi)新的特征,采用神經(jīng)網(wǎng)絡(luò)對(duì)語(yǔ)音和噪聲進(jìn)行分類(lèi),不依賴(lài)于固定的閾值,可以靈活應(yīng)對(duì)復(fù)雜而多變的噪聲環(huán)境,較現(xiàn)有的基于互通道能量差及其改進(jìn)的VAD算法準(zhǔn)確性更高。(2)論文的第三章利用了手機(jī)兩個(gè)麥克風(fēng)接收帶噪語(yǔ)音信號(hào)功率的比值在噪聲段和語(yǔ)音段的不同,提出一種新的基于互通道功率比值的VAD算法,在此基礎(chǔ)上,將第二章的神經(jīng)網(wǎng)絡(luò)VAD算法與基于互通道功率比值的VAD算法相結(jié)合,最終得到一種適用于手機(jī)消噪處理中的語(yǔ)音和噪聲活動(dòng)檢測(cè)算法,該算法能夠分別針對(duì)語(yǔ)音和噪聲進(jìn)行準(zhǔn)確的檢測(cè),使用檢測(cè)結(jié)果控制時(shí)域語(yǔ)音增強(qiáng)算法對(duì)帶噪語(yǔ)音信號(hào)進(jìn)行消噪處理,在濾除噪聲的同時(shí)能夠顯著降低對(duì)語(yǔ)音信號(hào)造成的損傷,提高語(yǔ)音的可懂度,特別是對(duì)方向性的語(yǔ)音干擾也能夠有很好的抑制效果。(3)為了進(jìn)一步濾除第三章時(shí)域語(yǔ)音增強(qiáng)處理后殘留的線(xiàn)性不相關(guān)噪聲,論文的第四章將時(shí)域輸出的增強(qiáng)語(yǔ)音信號(hào)和背景噪聲信號(hào)轉(zhuǎn)化到頻域進(jìn)行進(jìn)一步的消噪處理,并分別針對(duì)消噪算法中兩個(gè)重要的組成部分:噪聲估計(jì)和噪聲消除做了改進(jìn)。首先結(jié)合單、雙麥克風(fēng)的噪聲估計(jì)算法,提高了噪聲估計(jì)的準(zhǔn)確性,然后將基音檢測(cè)與消噪處理相結(jié)合,在語(yǔ)音幀中估計(jì)語(yǔ)音基音頻率確定語(yǔ)音和噪聲頻率點(diǎn),針對(duì)語(yǔ)音和噪聲頻率點(diǎn)分別調(diào)整維納濾波器的參數(shù),在對(duì)噪聲進(jìn)行濾除的同時(shí)盡可能地保留語(yǔ)音頻點(diǎn),從而減少了語(yǔ)音失真。實(shí)驗(yàn)結(jié)果表明,與現(xiàn)有的雙麥克風(fēng)消噪算法相比,經(jīng)過(guò)改進(jìn)后的頻域消噪算法能夠更有效地減少對(duì)語(yǔ)音信號(hào)造成的損害,提高了手機(jī)的通話(huà)質(zhì)量。(4)使用者握機(jī)姿勢(shì)的不同或通話(huà)過(guò)程中手機(jī)的轉(zhuǎn)動(dòng)會(huì)對(duì)消噪算法的性能產(chǎn)生影響,如果能夠?qū)崟r(shí)確定手機(jī)的位置,并依據(jù)當(dāng)前位置及時(shí)調(diào)整消噪算法的參數(shù)則能夠提高算法的性能�，F(xiàn)有的定位算法大多需要三個(gè)以上的麥克風(fēng)陣列,無(wú)法直接用于雙麥克風(fēng)的手機(jī)上。論文第五章結(jié)合手機(jī)這一特定的應(yīng)用場(chǎng)景提出了一種只使用兩個(gè)麥克風(fēng)在三維空間中定位手機(jī)位置的新方法,該方法使用互通道時(shí)延和通過(guò)對(duì)目標(biāo)語(yǔ)音到達(dá)兩個(gè)麥克風(fēng)的傳播路徑進(jìn)行分析提出的新特征子帶互通道功率比作為輸入,訓(xùn)練神經(jīng)網(wǎng)絡(luò)輸出手機(jī)的空間位置。(5)當(dāng)檢測(cè)到手機(jī)偏離標(biāo)準(zhǔn)通話(huà)位置時(shí),依據(jù)第五章神經(jīng)網(wǎng)絡(luò)定位的結(jié)果及時(shí)地對(duì)論文第三和第四章中的時(shí)域和頻域消噪算法的參數(shù)進(jìn)行調(diào)整,避免了算法因手機(jī)位置的移動(dòng)而造成的通話(huà)性能下降。實(shí)驗(yàn)結(jié)果表明,現(xiàn)有的雙麥克風(fēng)消噪算法由于忽略了手機(jī)轉(zhuǎn)動(dòng)的問(wèn)題,在真實(shí)場(chǎng)景下的性能無(wú)法得到保障,而本論文提出的消噪算法性能更加穩(wěn)定也更具有實(shí)用性。論文的結(jié)尾概括了全文的主要工作和創(chuàng)新性的研究成果,并對(duì)進(jìn)一步的研究進(jìn)行了展望。
[Abstract]:Mobile phone, the largest portable mobile communication device in the market and the largest consumer in the market, has been widely concerned about the improvement of call quality. Because of the wide use of the mobile phone, the background noise environment that needs to be dealt with is very complex. This requires that the denoising algorithm applied to the flat platform of the mobile phone can be flexible to deal with many kinds of noise. On the premise of guaranteeing the quality of voice calls, the background noise is effectively suppressed, and the performance of the algorithm will not decline because of the different positions of the user and the rotation of the mobile phone during the call process. It has good robustness to the real environment. In recent years, the application of artificial intelligence has been gradually covered in various fields, and machine learning is used as its application. The core is to improve the performance of the algorithm in continuous data learning. This feature makes the machine learning related algorithms (such as neural networks) flexible to cope with complex and changeable external environments. If the machine learning is applied to the mobile phone denoising algorithm, the performance of the algorithm will be significantly improved in the real scene. This paper tries to apply the neural network model in machine learning to the algorithm of mobile phone noise elimination, and improves the flexibility and robustness of the algorithm in the real use scene. The main package of full text work and innovation includes the following aspects: (1) for the existing dual channel In the second chapter, a new dual channel VAD algorithm is proposed in the second chapter of the paper. The second chapter combines the energy difference of the frequency band and the normalized cross channel correlation as two new features, and the neural network is used for speech and noise. The classification of sound is not dependent on the fixed threshold, and it can handle complex and changeable noise environment flexibly. The VAD algorithm based on the existing mutual channel energy difference and its improved algorithm is more accurate. (2) the third chapter of the paper uses the difference of the ratio of the power of the noisy speech signals received by the two microphone of the mobile phone, and the difference between the noise and the speech segments is proposed. A new VAD algorithm based on the ratio of mutual channel power is proposed. On this basis, the second chapter neural network VAD algorithm is combined with the VAD algorithm based on the power ratio of mutual channel. Finally, a speech and noise detection algorithm suitable for mobile phone noise elimination can be obtained. The algorithm can be used to correct speech and noise respectively. Detection, using the detection results to control the time domain speech enhancement algorithm to denoise the noisy speech signal. While filtering the noise, it can significantly reduce the damage to the speech signal and improve the intelligibility of the speech, especially for the directional speech interference. (3) in order to further filter the third chapters The fourth chapter of this paper transforms the enhanced speech signal and background noise signal in the time domain to the frequency domain for further de-noising. The two important components of the denoising algorithm: noise estimation and noise elimination are improved. First, single, double Mike is combined. The algorithm of wind noise estimation improves the accuracy of noise estimation. Then the pitch detection and noise elimination are combined. The speech and noise frequency points are estimated in the speech frame, and the parameters of the Wiener filter are adjusted to the speech and noise frequency points. While the noise is filtered, the speech is preserved as much as possible. The experimental results show that compared with the existing double microphone denoising algorithm, the improved frequency domain denoising algorithm can reduce the damage to the speech signal more effectively and improve the call quality of the mobile phone. (4) the rotation of the mobile phone in the different position of the user's grip or the call process will eliminate the noise. The performance of the algorithm has an impact. If it can determine the location of the mobile phone in real time and adjust the parameters of the denoising algorithm in time according to the current position, the algorithm can improve the performance of the algorithm. Most of the existing location algorithms need more than three microphone arrays and can not be used directly on the two microphone mobile phones. The fifth chapter of the paper combines with the specific mobile phone. In the application scenario, a new method of locating the mobile phone in a three-dimensional space with only two microphones is used. This method uses the mutual channel time delay and the new characteristic subband power ratio as input by analyzing the propagation path of the target speech to two microphones, and trains the space of the neural network to output the cell phone space. Position. (5) when the mobile phone is detected to deviate from the standard call position, the parameters of the time domain and frequency domain denoising algorithm in the third and fourth chapters of the paper are adjusted in time according to the results of the fifth chapter neural network positioning, which avoids the call performance degradation caused by the mobile location of the mobile phone. The experimental results show that the existing dual microphone is used. Because of ignoring the problem of mobile phone rotation, the performance of the noise elimination algorithm can not be guaranteed in the real scene, and the performance of the denoising algorithm proposed in this paper is more stable and more practical. The end of this paper summarizes the main work and innovative research results of the full text, and looks forward to the further research.
【學(xué)位授予單位】：南京師范大學(xué)
【學(xué)位級(jí)別】：博士
【學(xué)位授予年份】：2017
【分類(lèi)號(hào)】：TN912.3;TP181

【參考文獻(xiàn)】

相關(guān)期刊論文前10條

1 紀(jì)振發(fā);楊暉;李然;金銀超;;基于短時(shí)自相關(guān)及過(guò)零率的語(yǔ)音端點(diǎn)檢測(cè)算法[J];電子科技;2016年09期

2 章雒霏;張銘;李晨;;一種新的語(yǔ)音和噪聲活動(dòng)檢測(cè)算法及其在手機(jī)雙麥克風(fēng)消噪系統(tǒng)中的應(yīng)用[J];電子與信息學(xué)報(bào);2016年08期

3 王明合;張二華;唐振民;許昊;;基于Fisher線(xiàn)性判別分析的語(yǔ)音信號(hào)端點(diǎn)檢測(cè)方法[J];電子與信息學(xué)報(bào);2015年06期

4 張宗帥;顧亞平;張俊;楊小平;;基于HRTF的虛擬聲源定位[J];網(wǎng)絡(luò)新媒體技術(shù);2015年02期

5 郭海燕;李梟雄;李擬s，

本文編號(hào)：2023281

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/xinxigongchenglunwen/2023281.html

上一篇：基于尋找小重量碼字算法的LDPC碼開(kāi)集識(shí)別
下一篇：超高頻柔性抗金屬RFID標(biāo)簽天線(xiàn)研究設(shè)計(jì)

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于機(jī)器學(xué)習(xí)的雙麥克風(fēng)手機(jī)語(yǔ)音增強(qiáng)算法研究