強噪聲環(huán)境下語音識別及VUI系統(tǒng)設計與實現(xiàn)
發(fā)布時間:2018-09-18 13:31
【摘要】:伴隨著人機交互技術的快速發(fā)展,語音用戶界面(Voice User Interface,VUI)逐步成為國內(nèi)外的研究熱點。借助VUI系統(tǒng),改變傳統(tǒng)的鍵盤輸入模式,代之以語音輸入的方式,人機交互更加的便捷和人性化。然而實際應用中環(huán)境噪聲復雜,VUI往往會遇到識別和訓練環(huán)境不相匹配的情況,從而使得語音識別率較低。因此,本文將經(jīng)驗模態(tài)分解(Empirical Mode Decomposition,EMD)、希爾伯特-黃變化(Hilbert-Huang Transform,HHT)以及雙麥克風噪聲干擾對消技術相結(jié)合。提高了VUI系統(tǒng)在強噪聲環(huán)境下的識別率,從而給飛機輔助維修設備提供可靠的人機交互。本文的主要研究內(nèi)容如下:第一,針對VUI系統(tǒng)在國內(nèi)外的研究現(xiàn)狀和發(fā)展趨勢,分析了目前民用航空輔助維修的需求,闡述了存在的實際問題和需要改進的方面。第二,以往的語音端點檢測算法一般是利用語音信號的短時能量、短時平均過零率等時域特征參數(shù)。能量計算方法不盡合理,且在低信噪比情況下識別效果較差。本文研究了基于EMD和Teager能量算子的語音端點檢測技術,該方法結(jié)合EMD和Teager能量算子在表征非線性非平穩(wěn)信號上的優(yōu)勢,EMD分解語音信號實現(xiàn)初步去噪,然后利用Teager能量來代替短時能量進行端點檢測。第三,在去噪處理方面,傳統(tǒng)的方法是用單一麥克風獲取帶噪語音,然后進行小波變換、譜減等。考慮到飛機維修現(xiàn)場的噪聲頻域分布更廣、幅度更大等特性,本文引入雙麥克風自適應噪聲對消技術,一路麥克風采集帶噪語音,一路麥克風采集背景噪聲,利用遞歸最小二乘(Recursive Least Square,RLS)自適應算法,在時域上對消兩路信號,最大程度去除噪聲成分,保留有效語音,最終實現(xiàn)信噪比的提高。第四,詳細闡述了基于以上兩種技術的VUI系統(tǒng)各模塊實現(xiàn)過程以及相互之間的通信方式。該設計采用客戶端-服務器端(C/S)結(jié)構,有效地利用了客戶端和服務器端的負載。通過對隱馬爾科夫模型(Hidden Markov Mode,HMM)的10次自適應訓練,從語音模板、噪聲門限值和二次識別語音庫這三個方面進行改進,對語音信號進行測試實驗,給出了本文所設計VUI的識別率測試結(jié)果。分析表明,該VUI系統(tǒng)具有更強的抗噪性能,在識別率測試上較以往的VUI系統(tǒng)有5%左右的提高。
[Abstract]:With the rapid development of human-computer interaction technology, voice user interface (Voice User Interface,VUI) has gradually become a research hotspot at home and abroad. With the help of VUI system, the traditional keyboard input mode is changed and replaced by voice input mode, so the human-computer interaction is more convenient and humanized. However, in the practical application, the environment noise is complex and the VUI often meets the situation that the recognition and training environment do not match each other, which makes the speech recognition rate lower. Therefore, empirical mode decomposition (Empirical Mode Decomposition,EMD), Hilbert-Huang variation (Hilbert-Huang Transform,HHT) and dual microphone noise cancellation are combined in this paper. The recognition rate of VUI system in strong noise environment is improved, thus providing reliable man-machine interaction for aircraft auxiliary maintenance equipment. The main contents of this paper are as follows: firstly, in view of the research status and development trend of VUI system at home and abroad, the requirements of civil aviation auxiliary maintenance are analyzed, and the existing practical problems and aspects need to be improved are expounded. Secondly, the previous speech endpoint detection algorithms usually use the time domain characteristic parameters such as the short time energy of speech signal, the short time average zero crossing rate and so on. The energy calculation method is not reasonable, and the recognition effect is poor in the case of low signal-to-noise ratio (SNR). In this paper, the speech endpoint detection technology based on EMD and Teager energy operator is studied. This method combines the advantages of EMD and Teager energy operators in the representation of nonlinear non-stationary signals. Then the Teager energy is used to replace the short-time energy for endpoint detection. Thirdly, in the aspect of denoising, the traditional method is to use a single microphone to acquire noisy speech, then wavelet transform, spectral subtraction and so on. Considering the characteristics of the aircraft maintenance site, such as wider distribution of noise frequency domain and larger amplitude, this paper introduces the dual-microphone adaptive noise cancellation technology, one way microphone to collect noisy voice, the other way microphone to collect background noise, By using the recursive least squares (Recursive Least Square,RLS) adaptive algorithm, two channels of signals are eliminated in time domain, the noise components are removed to the maximum extent, the effective speech is retained, and the signal-to-noise ratio (SNR) is improved. Fourthly, the realization process and communication mode of each module of VUI system based on the above two technologies are described in detail. The design adopts the client-server (C / S) structure, and makes effective use of the load of the client and server. Based on the 10 times adaptive training of Hidden Markov Model (Hidden Markov Mode,HMM), the speech signal is tested from three aspects: speech template, noise threshold and second recognition speech corpus. The recognition rate test results of the VUI designed in this paper are given. The analysis shows that the VUI system has better anti-noise performance, and the recognition rate is about 5% higher than that of the previous VUI system.
【學位授予單位】:電子科技大學
【學位級別】:碩士
【學位授予年份】:2014
【分類號】:TN912.34
本文編號:2248071
[Abstract]:With the rapid development of human-computer interaction technology, voice user interface (Voice User Interface,VUI) has gradually become a research hotspot at home and abroad. With the help of VUI system, the traditional keyboard input mode is changed and replaced by voice input mode, so the human-computer interaction is more convenient and humanized. However, in the practical application, the environment noise is complex and the VUI often meets the situation that the recognition and training environment do not match each other, which makes the speech recognition rate lower. Therefore, empirical mode decomposition (Empirical Mode Decomposition,EMD), Hilbert-Huang variation (Hilbert-Huang Transform,HHT) and dual microphone noise cancellation are combined in this paper. The recognition rate of VUI system in strong noise environment is improved, thus providing reliable man-machine interaction for aircraft auxiliary maintenance equipment. The main contents of this paper are as follows: firstly, in view of the research status and development trend of VUI system at home and abroad, the requirements of civil aviation auxiliary maintenance are analyzed, and the existing practical problems and aspects need to be improved are expounded. Secondly, the previous speech endpoint detection algorithms usually use the time domain characteristic parameters such as the short time energy of speech signal, the short time average zero crossing rate and so on. The energy calculation method is not reasonable, and the recognition effect is poor in the case of low signal-to-noise ratio (SNR). In this paper, the speech endpoint detection technology based on EMD and Teager energy operator is studied. This method combines the advantages of EMD and Teager energy operators in the representation of nonlinear non-stationary signals. Then the Teager energy is used to replace the short-time energy for endpoint detection. Thirdly, in the aspect of denoising, the traditional method is to use a single microphone to acquire noisy speech, then wavelet transform, spectral subtraction and so on. Considering the characteristics of the aircraft maintenance site, such as wider distribution of noise frequency domain and larger amplitude, this paper introduces the dual-microphone adaptive noise cancellation technology, one way microphone to collect noisy voice, the other way microphone to collect background noise, By using the recursive least squares (Recursive Least Square,RLS) adaptive algorithm, two channels of signals are eliminated in time domain, the noise components are removed to the maximum extent, the effective speech is retained, and the signal-to-noise ratio (SNR) is improved. Fourthly, the realization process and communication mode of each module of VUI system based on the above two technologies are described in detail. The design adopts the client-server (C / S) structure, and makes effective use of the load of the client and server. Based on the 10 times adaptive training of Hidden Markov Model (Hidden Markov Mode,HMM), the speech signal is tested from three aspects: speech template, noise threshold and second recognition speech corpus. The recognition rate test results of the VUI designed in this paper are given. The analysis shows that the VUI system has better anti-noise performance, and the recognition rate is about 5% higher than that of the previous VUI system.
【學位授予單位】:電子科技大學
【學位級別】:碩士
【學位授予年份】:2014
【分類號】:TN912.34
【參考文獻】
相關期刊論文 前1條
1 張德祥;吳小培;呂釗;郭曉靜;;基于經(jīng)驗模態(tài)分解和Teager峭度的語音端點檢測[J];儀器儀表學報;2010年03期
,本文編號:2248071
本文鏈接:http://sikaile.net/kejilunwen/wltx/2248071.html
最近更新
教材專著