天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于語(yǔ)音頻率特性抑制音素影響的說(shuō)話(huà)人特征提取

發(fā)布時(shí)間:2018-03-24 10:17

  本文選題:說(shuō)話(huà)人辨認(rèn) 切入點(diǎn):音素的個(gè)人信息分布 出處:《天津大學(xué)》2014年博士論文


【摘要】:語(yǔ)音具有語(yǔ)言信息與個(gè)人信息;語(yǔ)言信息表示說(shuō)話(huà)人的共性特征,個(gè)人信息表示說(shuō)話(huà)人個(gè)性特征。進(jìn)行說(shuō)話(huà)人識(shí)別時(shí),需要保存說(shuō)話(huà)人個(gè)性信息并同時(shí)抑制語(yǔ)言信息。然而,語(yǔ)音信號(hào)的說(shuō)話(huà)人個(gè)性信息與語(yǔ)言信息很難分開(kāi)。為了減小發(fā)音內(nèi)容之間差異對(duì)說(shuō)話(huà)人識(shí)別的影響,本文提出了音素影響抑制(PhonemeEffect Suppression,PES)法,以便強(qiáng)調(diào)說(shuō)話(huà)人個(gè)人信息的差異。 為了得到在頻域上說(shuō)話(huà)人信息的準(zhǔn)確分布,本文首先研究了語(yǔ)音頻率特性。我們通過(guò)得到每個(gè)音素在各個(gè)子頻帶上對(duì)說(shuō)話(huà)人個(gè)性信息的貢獻(xiàn)率(PhonemeF-ratio Contribution,PFC),提出了在不同音素的說(shuō)話(huà)人信息的分布。語(yǔ)音受到人的發(fā)聲器官、發(fā)音方式與發(fā)音位置的影響。所以在每個(gè)音素的說(shuō)話(huà)人信息的分布反映特定生理發(fā)音器官與發(fā)音方式的個(gè)性。本文在三種語(yǔ)言(英語(yǔ)、漢語(yǔ)與朝鮮語(yǔ))上分別研究了說(shuō)話(huà)人個(gè)人信息的聲學(xué)表達(dá)。通過(guò)測(cè)試每個(gè)音素在各個(gè)子頻帶上對(duì)說(shuō)話(huà)人個(gè)性信息的貢獻(xiàn)率,發(fā)現(xiàn)濁音、清音和鼻音的都具有不同的說(shuō)話(huà)人個(gè)性信息的分布。 在此基礎(chǔ)上,本文提出了PES方法,抑制了不同音素對(duì)說(shuō)話(huà)人個(gè)性的影響,得出了說(shuō)話(huà)人個(gè)人信息在頻域上的分布(Phoneme Effect Suppressed SpeakerInformation Distribution,PES-SID)。 最后,本文提出了一種提取說(shuō)話(huà)人特征的新方法,此方法專(zhuān)注于基于說(shuō)話(huà)人個(gè)人信息分布的非均勻頻率尺度的表示。本文提出的說(shuō)話(huà)人特征用于GMM說(shuō)話(huà)人模型并進(jìn)行了說(shuō)話(huà)人辨認(rèn)實(shí)驗(yàn),并與另外兩種說(shuō)話(huà)人特征作了對(duì)比。實(shí)驗(yàn)結(jié)果表明我們提出的特征優(yōu)于其他兩種特征。與MFCC(Mel Frequency CepstrumCoefficient)特征相比,對(duì)于不同的語(yǔ)言,我們提出的特征都降低了識(shí)別錯(cuò)誤率:對(duì)于英語(yǔ)降低了61.1%,對(duì)于朝鮮語(yǔ)68.0%,對(duì)中文32.9%。與FFCC(F-ratioFrequency Cepstrum Coefficient)相比,我們的錯(cuò)誤率降低了:30%(英語(yǔ)),,28.5%(朝鮮語(yǔ)),6.6%(中文)。這些結(jié)果表明,本文提出的特征對(duì)于不同的語(yǔ)言也具有一定的說(shuō)話(huà)人鑒別魯棒性。
[Abstract]:Speech has language information and personal information; language information represents the common characteristics of the speaker and personal information represents the individual characteristics of the speaker. In the process of speaker recognition, it is necessary to preserve the speaker's personality information and suppress the language information at the same time. It is difficult to separate the speaker's personality information from the language information of the speech signal. In order to reduce the influence of the difference between the pronunciation contents on the speaker's recognition, this paper proposes a phoneme influence suppression method (PhonemeEffect support expression) to emphasize the difference of the speaker's personal information. In order to obtain the accurate distribution of speaker information in frequency domain, In this paper, we first study the frequency characteristics of speech. By obtaining the contribution rate of each phoneme to the speaker's personality information in each subband, we propose the distribution of speaker information in different phonemes. Therefore, the distribution of speaker information in each phoneme reflects the personality of specific physiological organs and patterns of pronunciation. In this paper, three languages (English, English, English, English, English, English, English, English, English, English, English, English, English, English, English, English, etc.). The acoustic expression of the speaker's personal information was studied in Chinese and Korean respectively. By testing the contribution rate of each phoneme to the speaker's personality information in each sub-band, we found the voiced sound. Clear tone and nasal sound have different distribution of speaker's personality information. On this basis, the PES method is proposed to suppress the influence of different phonemes on the speaker's personality, and the distribution of the speaker's personal information in the frequency domain is obtained. Finally, a new method for extracting speaker features is proposed. This method focuses on the representation of non-uniform frequency scales based on the distribution of personal information of the speaker. The speaker feature proposed in this paper is used in the GMM speaker model and the speaker recognition experiment is carried out. Compared with the other two speaker features, the experimental results show that the proposed feature is superior to the other two features. All the features we proposed reduced the rate of recognition errors: 61.1 for English, 68.0 for Korean, 32.9for Chinese. Compared with FFCC(F-ratioFrequency Cepstrum efficiency, our error rate was lower than that of FFCC(F-ratioFrequency. The proposed features are also robust to speaker identification for different languages.
【學(xué)位授予單位】:天津大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2014
【分類(lèi)號(hào)】:TN912.3

【參考文獻(xiàn)】

相關(guān)期刊論文 前3條

1 岳喜才;葉大田;;文本無(wú)關(guān)的說(shuō)話(huà)人識(shí)別:綜述[J];模式識(shí)別與人工智能;2001年02期

2 鄧菁;鄭方;劉建;吳文虎;;Mel子帶譜質(zhì)心和高斯混合相關(guān)性在魯棒話(huà)者識(shí)別中的應(yīng)用[J];聲學(xué)學(xué)報(bào);2006年05期

3 俞一彪;袁冬梅;薛峰;;一種適于說(shuō)話(huà)人識(shí)別的非線(xiàn)性頻率尺度變換[J];聲學(xué)學(xué)報(bào)(中文版);2008年05期



本文編號(hào):1657858

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/wltx/1657858.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶(hù)8969e***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com