基于語音頻率特性抑制音素影響的說話人特征提取
發(fā)布時間:2018-03-24 10:17
本文選題:說話人辨認 切入點:音素的個人信息分布 出處:《天津大學(xué)》2014年博士論文
【摘要】:語音具有語言信息與個人信息;語言信息表示說話人的共性特征,個人信息表示說話人個性特征。進行說話人識別時,需要保存說話人個性信息并同時抑制語言信息。然而,語音信號的說話人個性信息與語言信息很難分開。為了減小發(fā)音內(nèi)容之間差異對說話人識別的影響,本文提出了音素影響抑制(PhonemeEffect Suppression,PES)法,以便強調(diào)說話人個人信息的差異。 為了得到在頻域上說話人信息的準(zhǔn)確分布,本文首先研究了語音頻率特性。我們通過得到每個音素在各個子頻帶上對說話人個性信息的貢獻率(PhonemeF-ratio Contribution,PFC),提出了在不同音素的說話人信息的分布。語音受到人的發(fā)聲器官、發(fā)音方式與發(fā)音位置的影響。所以在每個音素的說話人信息的分布反映特定生理發(fā)音器官與發(fā)音方式的個性。本文在三種語言(英語、漢語與朝鮮語)上分別研究了說話人個人信息的聲學(xué)表達。通過測試每個音素在各個子頻帶上對說話人個性信息的貢獻率,發(fā)現(xiàn)濁音、清音和鼻音的都具有不同的說話人個性信息的分布。 在此基礎(chǔ)上,本文提出了PES方法,抑制了不同音素對說話人個性的影響,得出了說話人個人信息在頻域上的分布(Phoneme Effect Suppressed SpeakerInformation Distribution,PES-SID)。 最后,本文提出了一種提取說話人特征的新方法,此方法專注于基于說話人個人信息分布的非均勻頻率尺度的表示。本文提出的說話人特征用于GMM說話人模型并進行了說話人辨認實驗,并與另外兩種說話人特征作了對比。實驗結(jié)果表明我們提出的特征優(yōu)于其他兩種特征。與MFCC(Mel Frequency CepstrumCoefficient)特征相比,對于不同的語言,我們提出的特征都降低了識別錯誤率:對于英語降低了61.1%,對于朝鮮語68.0%,對中文32.9%。與FFCC(F-ratioFrequency Cepstrum Coefficient)相比,我們的錯誤率降低了:30%(英語),,28.5%(朝鮮語),6.6%(中文)。這些結(jié)果表明,本文提出的特征對于不同的語言也具有一定的說話人鑒別魯棒性。
[Abstract]:Speech has language information and personal information; language information represents the common characteristics of the speaker and personal information represents the individual characteristics of the speaker. In the process of speaker recognition, it is necessary to preserve the speaker's personality information and suppress the language information at the same time. It is difficult to separate the speaker's personality information from the language information of the speech signal. In order to reduce the influence of the difference between the pronunciation contents on the speaker's recognition, this paper proposes a phoneme influence suppression method (PhonemeEffect support expression) to emphasize the difference of the speaker's personal information. In order to obtain the accurate distribution of speaker information in frequency domain, In this paper, we first study the frequency characteristics of speech. By obtaining the contribution rate of each phoneme to the speaker's personality information in each subband, we propose the distribution of speaker information in different phonemes. Therefore, the distribution of speaker information in each phoneme reflects the personality of specific physiological organs and patterns of pronunciation. In this paper, three languages (English, English, English, English, English, English, English, English, English, English, English, English, English, English, English, English, etc.). The acoustic expression of the speaker's personal information was studied in Chinese and Korean respectively. By testing the contribution rate of each phoneme to the speaker's personality information in each sub-band, we found the voiced sound. Clear tone and nasal sound have different distribution of speaker's personality information. On this basis, the PES method is proposed to suppress the influence of different phonemes on the speaker's personality, and the distribution of the speaker's personal information in the frequency domain is obtained. Finally, a new method for extracting speaker features is proposed. This method focuses on the representation of non-uniform frequency scales based on the distribution of personal information of the speaker. The speaker feature proposed in this paper is used in the GMM speaker model and the speaker recognition experiment is carried out. Compared with the other two speaker features, the experimental results show that the proposed feature is superior to the other two features. All the features we proposed reduced the rate of recognition errors: 61.1 for English, 68.0 for Korean, 32.9for Chinese. Compared with FFCC(F-ratioFrequency Cepstrum efficiency, our error rate was lower than that of FFCC(F-ratioFrequency. The proposed features are also robust to speaker identification for different languages.
【學(xué)位授予單位】:天津大學(xué)
【學(xué)位級別】:博士
【學(xué)位授予年份】:2014
【分類號】:TN912.3
【參考文獻】
相關(guān)期刊論文 前3條
1 岳喜才;葉大田;;文本無關(guān)的說話人識別:綜述[J];模式識別與人工智能;2001年02期
2 鄧菁;鄭方;劉建;吳文虎;;Mel子帶譜質(zhì)心和高斯混合相關(guān)性在魯棒話者識別中的應(yīng)用[J];聲學(xué)學(xué)報;2006年05期
3 俞一彪;袁冬梅;薛峰;;一種適于說話人識別的非線性頻率尺度變換[J];聲學(xué)學(xué)報(中文版);2008年05期
本文編號:1657858
本文鏈接:http://sikaile.net/kejilunwen/wltx/1657858.html
最近更新
教材專著