語音端點(diǎn)檢測(cè)算法的研究及應(yīng)用
發(fā)布時(shí)間:2018-08-23 19:51
【摘要】:語音端點(diǎn)檢測(cè)(也稱語音活動(dòng)檢測(cè),Voice Activity Detection VAD)是指從混有噪聲的語音信號(hào)中檢測(cè)語音信息的存在與否。語音端點(diǎn)檢測(cè)通常用于語音編碼、語音增強(qiáng)等語音處理系統(tǒng)中,起到了降低語音編碼速率、占用較少通信帶寬、提高了移動(dòng)設(shè)備使用效率、準(zhǔn)確識(shí)別語音信息等作用。在語音信號(hào)分析中,首先要求對(duì)系統(tǒng)輸入的含噪音信號(hào)進(jìn)行判斷,準(zhǔn)確地找出信號(hào)中有用的信息段,減少信號(hào)處理的數(shù)據(jù)量,提高語音處理效率。傳統(tǒng)的雙門限法語音端點(diǎn)檢測(cè)算法在無噪聲污染的環(huán)境中檢測(cè)準(zhǔn)確度較高,但在實(shí)際的噪聲環(huán)境中,尤其是低信噪比條件下,端點(diǎn)檢測(cè)正確率較低。本文以不同語者性別信息為前提,對(duì)小波能量熵端點(diǎn)檢測(cè)算法進(jìn)行改進(jìn)。實(shí)驗(yàn)數(shù)據(jù)表明,改進(jìn)的小波能量熵算法有效的提高了端點(diǎn)檢測(cè)的準(zhǔn)確率。本文研究的主要內(nèi)容和成果如下:1.本文提出了一種基于語音屬性統(tǒng)計(jì)量的語音信號(hào)分析方法。已有的語音分析方法主要關(guān)注語音短時(shí)能量、短時(shí)過零率、基音周期、共振峰頻率、Mel倒譜系數(shù)等特征,本文根據(jù)不同語者發(fā)音特性從短時(shí)能量方差、Mel倒譜距離方差、MFCC倒譜距離方差屬性等方面進(jìn)行多維度的語音信號(hào)分析。對(duì)從語音信號(hào)中提取到的239維數(shù)據(jù),運(yùn)用Relief[1]特征選擇算法進(jìn)行降維,建立合理的特征集合。實(shí)驗(yàn)表明,引入語音屬性統(tǒng)計(jì)量后,語音信息識(shí)別準(zhǔn)確率得到明顯的提高。2.根據(jù)不同性別語者發(fā)音特性,引入模糊隸屬度函數(shù)的概念,對(duì)語音信號(hào)的語者性別信息進(jìn)行檢測(cè)。由不同性別語者的基音頻率變化曲線,建立了模糊隸屬度函數(shù)模型,此模型可以對(duì)語者性別信息做出初步的判別。在分析語者性別模糊隸屬度的基礎(chǔ)上,對(duì)于不能準(zhǔn)確識(shí)別語者性別信息的語音文件進(jìn)一步采用決策樹模型進(jìn)行識(shí)別。實(shí)驗(yàn)表明,在低信噪比條件下,該混合模型對(duì)語者性別信息的識(shí)別有較大改進(jìn),識(shí)別效果較好。3.在準(zhǔn)確識(shí)別語者性別信息的前提下,本文分析了小波算法和小波能量熵算法在語音端點(diǎn)檢測(cè)應(yīng)用中的優(yōu)點(diǎn)與不足之處,并對(duì)小波能量熵算法從運(yùn)算準(zhǔn)確率方面進(jìn)行了改進(jìn)。最后,通過仿真實(shí)驗(yàn)運(yùn)用改進(jìn)的小波能量熵算法對(duì)含噪聲的語音文件進(jìn)行了測(cè)試與分析。實(shí)驗(yàn)數(shù)據(jù)表明,在不同噪聲背景、信噪比為5db時(shí),該算法能準(zhǔn)確的檢測(cè)出語音段和非語音段,顯著地降低了信息丟失量,準(zhǔn)確率有較大提高。
[Abstract]:Voice Endpoint Detection (also known as Voice activity Detection Voice Activity Detection VAD) is used to detect the presence or absence of speech information from noisy speech signals. Speech endpoint detection is usually used in speech coding, speech enhancement and other speech processing systems, which can reduce the speech coding rate, occupy less communication bandwidth, improve the efficiency of mobile devices, and accurately recognize speech information. In the analysis of speech signal, it is necessary to judge the noisy signal input in the system, find out the useful information segment of the signal accurately, reduce the data amount of signal processing, and improve the efficiency of speech processing. The traditional dual-threshold speech endpoint detection algorithm has a high accuracy in a noise-free environment, but the accuracy of endpoint detection is low in the actual noise environment, especially in the low SNR environment. In this paper, the wavelet energy entropy endpoint detection algorithm is improved on the premise of gender information of different speakers. Experimental data show that the improved wavelet energy entropy algorithm can effectively improve the accuracy of endpoint detection. The main contents and results of this paper are as follows: 1. This paper presents a speech signal analysis method based on speech attribute statistics. The existing speech analysis methods mainly focus on the characteristics of speech short time energy, short time zero crossing rate, pitch period, resonance peak frequency and Mel cepstrum coefficient, etc. Based on the pronunciation characteristics of different speakers, this paper analyzes multi-dimensional speech signals from the aspects of short-term energy variance and Mel Cepstrum distance variance and MFCC Cepstrum distance variance attribute. For the 239-dimensional data extracted from speech signal, the Relief [1] feature selection algorithm is used to reduce the dimension and establish a reasonable feature set. The experimental results show that the accuracy of speech information recognition is obviously improved by introducing speech attribute statistics. According to the pronunciation characteristics of different gender speakers, the concept of fuzzy membership function is introduced to detect the speaker's gender information of speech signal. Based on the pitch frequency curve of different gender speakers, a fuzzy membership function model is established, which can be used to judge the gender information of the speaker. On the basis of analyzing the fuzzy membership degree of the speaker's gender, the decision tree model is used to recognize the speech file which can not accurately recognize the speaker's gender information. The experimental results show that the hybrid model can improve the recognition of speaker's gender information under the condition of low signal-to-noise ratio (SNR), and the recognition effect is better. 3. On the premise of accurately recognizing the speaker's gender information, this paper analyzes the advantages and disadvantages of wavelet algorithm and wavelet energy entropy algorithm in the application of speech endpoint detection, and improves the accuracy of wavelet energy entropy algorithm. Finally, an improved wavelet energy entropy algorithm is used to test and analyze the noisy speech files. Experimental data show that the algorithm can accurately detect the speech segment and the non-speech segment under different noise background and SNR of 5db, which can significantly reduce the amount of information loss and improve the accuracy of the algorithm.
【學(xué)位授予單位】:西安建筑科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2016
【分類號(hào)】:TN912.3
,
本文編號(hào):2199738
[Abstract]:Voice Endpoint Detection (also known as Voice activity Detection Voice Activity Detection VAD) is used to detect the presence or absence of speech information from noisy speech signals. Speech endpoint detection is usually used in speech coding, speech enhancement and other speech processing systems, which can reduce the speech coding rate, occupy less communication bandwidth, improve the efficiency of mobile devices, and accurately recognize speech information. In the analysis of speech signal, it is necessary to judge the noisy signal input in the system, find out the useful information segment of the signal accurately, reduce the data amount of signal processing, and improve the efficiency of speech processing. The traditional dual-threshold speech endpoint detection algorithm has a high accuracy in a noise-free environment, but the accuracy of endpoint detection is low in the actual noise environment, especially in the low SNR environment. In this paper, the wavelet energy entropy endpoint detection algorithm is improved on the premise of gender information of different speakers. Experimental data show that the improved wavelet energy entropy algorithm can effectively improve the accuracy of endpoint detection. The main contents and results of this paper are as follows: 1. This paper presents a speech signal analysis method based on speech attribute statistics. The existing speech analysis methods mainly focus on the characteristics of speech short time energy, short time zero crossing rate, pitch period, resonance peak frequency and Mel cepstrum coefficient, etc. Based on the pronunciation characteristics of different speakers, this paper analyzes multi-dimensional speech signals from the aspects of short-term energy variance and Mel Cepstrum distance variance and MFCC Cepstrum distance variance attribute. For the 239-dimensional data extracted from speech signal, the Relief [1] feature selection algorithm is used to reduce the dimension and establish a reasonable feature set. The experimental results show that the accuracy of speech information recognition is obviously improved by introducing speech attribute statistics. According to the pronunciation characteristics of different gender speakers, the concept of fuzzy membership function is introduced to detect the speaker's gender information of speech signal. Based on the pitch frequency curve of different gender speakers, a fuzzy membership function model is established, which can be used to judge the gender information of the speaker. On the basis of analyzing the fuzzy membership degree of the speaker's gender, the decision tree model is used to recognize the speech file which can not accurately recognize the speaker's gender information. The experimental results show that the hybrid model can improve the recognition of speaker's gender information under the condition of low signal-to-noise ratio (SNR), and the recognition effect is better. 3. On the premise of accurately recognizing the speaker's gender information, this paper analyzes the advantages and disadvantages of wavelet algorithm and wavelet energy entropy algorithm in the application of speech endpoint detection, and improves the accuracy of wavelet energy entropy algorithm. Finally, an improved wavelet energy entropy algorithm is used to test and analyze the noisy speech files. Experimental data show that the algorithm can accurately detect the speech segment and the non-speech segment under different noise background and SNR of 5db, which can significantly reduce the amount of information loss and improve the accuracy of the algorithm.
【學(xué)位授予單位】:西安建筑科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2016
【分類號(hào)】:TN912.3
,
本文編號(hào):2199738
本文鏈接:http://sikaile.net/kejilunwen/xinxigongchenglunwen/2199738.html
最近更新
教材專著