說話人識別中改進(jìn)特征提取算法的研究
本文選題:MFCC 切入點:平滑幅度譜包絡(luò) 出處:《太原理工大學(xué)》2015年碩士論文
【摘要】:說話人識別是廣義的語音識別。其基本思想是根據(jù)說話人的語音特征來確定說話人的身份。近年來,隨著科學(xué)技術(shù)的不斷進(jìn)步,各領(lǐng)域?qū)φf話人識別技術(shù)的要求也在不斷提高,這使得說話人識別技術(shù)面臨著很大的難題。一方面,說話人識別所用特征參數(shù)會隨著說話人的身體狀況、情緒特點和說話時所處的環(huán)境的變化而變化;另一方面,說話人識別注重的不是語音信號中語義信息,而是信號中的說話人個性特征信息。要想準(zhǔn)確的識別說話人的身份,就必須將語義信息和說話人的個性信息準(zhǔn)確的分離開。但是目前還沒有一種技術(shù)能將兩者完全分離。本文主要針對這些問題進(jìn)行了研究。 MFCC參數(shù)描述的是信號的譜包絡(luò)特征,而信號的譜包絡(luò)主要表征的是說話人的聲道特性,忽略了基音頻率對特征的影響。針對這一問題,本文提出了一種改進(jìn)算法,即在提取MFCC參數(shù)時,不直接將信號的頻譜通過梅爾濾波器組,而是先利用滑動平均濾波器對信號頻譜進(jìn)行平滑,得到信號譜包絡(luò)的近似表示。再將得到的結(jié)果通過梅爾濾波器進(jìn)行濾波。在此基礎(chǔ)上,用多窗頻譜估計方法代替Hamming窗的DFT變換來計算信號的頻譜,得到一種新的特征參數(shù)MTSMFCC。實驗表明,,基于MTSMFCC的說話人識別系統(tǒng),噪聲魯棒性和時間魯棒性都有所提高。 為了解決單一特征參數(shù)在噪聲環(huán)境下識別率低的問題,本文在原始MFCC的基礎(chǔ)上進(jìn)行了三個方面的融合:1.為了使特征參數(shù)能夠充分反映語音的動態(tài)特性,在原始MFCC的基礎(chǔ)上融合了一階差分參數(shù)MFCC,得到參數(shù)Fusion1;2.為了充分反映語音的低頻信息、中頻信息和高頻信息,對MFCC、IMFCC和MidMFCC進(jìn)行了融合,得到參數(shù)Fusion2。3.在前兩種融合的基礎(chǔ)上,對Fusion1和Fusion2進(jìn)行了融合,得到新的特征參數(shù)NMFCC。新參數(shù)NMFCC不僅符合人耳的聽覺特性,而且包含了語音信號中的低頻、中頻和高頻的信息,能夠更全面的反映說話人的個性信息。實驗表明,在噪音環(huán)境下,新特征參數(shù)NMFCC與Fusion1和Fusion2相比,識別率有不同程度的提高。
[Abstract]:Speaker recognition is a generalized speech recognition, whose basic idea is to determine the speaker's identity according to the speaker's speech characteristics. In recent years, with the development of science and technology, the requirements of speaker recognition technology in various fields are also increasing. On the one hand, the characteristic parameters used in speaker recognition will change with the changes of the speaker's physical condition, emotional characteristics and the environment in which he speaks; on the other hand, Speaker recognition focuses not on the semantic information in the speech signal, but on the speaker's personality information in the signal. It is necessary to separate the semantic information from the speaker's personality information accurately, but there is no technology to completely separate the two. This paper mainly focuses on these problems. The MFCC parameter describes the spectral envelope feature of the signal, while the spectral envelope of the signal mainly represents the speaker's channel characteristics, neglecting the influence of pitch frequency on the feature. In order to solve this problem, an improved algorithm is proposed in this paper. That is, when extracting MFCC parameters, the spectrum of the signal is not directly passed through the Mel filter bank, but the signal spectrum is smoothed by the moving average filter. The approximate representation of signal spectrum envelope is obtained. Then the result is filtered by Mel filter. On this basis, the multi-window spectrum estimation method is used instead of the DFT transform of the Hamming window to calculate the signal spectrum. A new feature parameter MTSM MTSMFCC is obtained. The experimental results show that the noise robustness and time robustness of the speaker recognition system based on MTSMFCC are improved. In order to solve the problem of low recognition rate of a single feature parameter in a noisy environment, the fusion of three aspects on the basis of the original MFCC is carried out in this paper. In order to make the feature parameter fully reflect the dynamic characteristics of speech, In order to fully reflect the low frequency information, if information and high frequency information of the speech, the fusion of the first order difference parameter MFCC and the high frequency information of MidMFCC is carried out, and the parameters of fusion 2. 3 are obtained based on the fusion of the first two kinds of fusion, the first order difference parameter MFCC is fused on the basis of the original MFCC, and the parameter Fusion1 / 2 is obtained, which can fully reflect the low frequency, if and high frequency information of the speech. Fusion1 and Fusion2 are fused to obtain a new characteristic parameter NMFCC.The new parameter NMFCC not only accords with the auditory characteristics of human ear, but also contains the information of low frequency, middle frequency and high frequency in speech signal. The experimental results show that the new feature parameter NMFCC can improve the recognition rate in different degrees compared with Fusion1 and Fusion2 in noise environment.
【學(xué)位授予單位】:太原理工大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2015
【分類號】:TN912.34
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 韓一;王國胤;楊勇;;基于MFCC的語音情感識別[J];重慶郵電大學(xué)學(xué)報(自然科學(xué)版);2008年05期
2 田克平;曾慶寧;;與文本無關(guān)說話人識別特征提取的改進(jìn)[J];電聲技術(shù);2008年11期
3 王颯;鄭鏈;;基于Fisher準(zhǔn)則和特征聚類的特征選擇[J];計算機(jī)應(yīng)用;2007年11期
4 張蕓;李昕;鄭宇;楊慶濤;;一種基于Fisher準(zhǔn)則的說話人識別方法研究[J];蘭州大學(xué)學(xué)報(自然科學(xué)版);2007年02期
5 胡政權(quán);曾毓敏;宗原;李夢超;;說話人識別中MFCC參數(shù)提取的改進(jìn)[J];計算機(jī)工程與應(yīng)用;2014年07期
6 鮮曉東;樊宇星;;基于Fisher比的梅爾倒譜系數(shù)混合特征提取方法[J];計算機(jī)應(yīng)用;2014年02期
7 張怡然;白靜;王力;;基于多窗頻譜估計和平滑幅度譜包絡(luò)的Mel頻率倒譜系數(shù)(MFCC)改進(jìn)算法[J];科學(xué)技術(shù)與工程;2014年19期
8 熊華喬;鄭建彬;詹恩奇;汪陽;華劍;;基于說話人模型聚類的說話人識別[J];計算機(jī)工程與應(yīng)用;2014年02期
9 周紹磊;廖劍;史賢俊;;基于Fisher準(zhǔn)則和最大熵原理的SVM核參數(shù)選擇方法[J];控制與決策;2014年11期
10 陶智,葛良;基于減譜法的語音增強(qiáng)和噪聲消除的研究[J];蘇州大學(xué)學(xué)報(自然科學(xué));2002年03期
相關(guān)博士學(xué)位論文 前1條
1 李燕萍;說話人辨認(rèn)中的特征參數(shù)提取和魯棒性技術(shù)研究[D];南京理工大學(xué);2009年
本文編號:1665204
本文鏈接:http://sikaile.net/kejilunwen/wltx/1665204.html