天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于聯(lián)合因子分析的耳語(yǔ)音說話人識(shí)別研究

發(fā)布時(shí)間:2018-08-01 17:13
【摘要】:說話人識(shí)別,作為生物特征識(shí)別的重要組成部分,可廣泛應(yīng)用于公安司法、生物醫(yī)學(xué)工程、軍隊(duì)安全系統(tǒng)等領(lǐng)域。隨著計(jì)算機(jī)和網(wǎng)絡(luò)技術(shù)的迅速發(fā)展,說話人識(shí)別技術(shù)已取得了長(zhǎng)足的進(jìn)步。耳語(yǔ)發(fā)音方式是一種特殊的語(yǔ)音交流形式,在很多場(chǎng)合應(yīng)用。由于耳語(yǔ)音與正常音之間存在較大差異,耳語(yǔ)方式下說話人識(shí)別無法照搬正常音說話人識(shí)別的方法,尚有很多問題亟待解決。 本文以與文本無關(guān)的耳語(yǔ)說話人識(shí)別為研究對(duì)象,進(jìn)行了較為深入的探索。耳語(yǔ)音說話人識(shí)別所面臨的問題主要包括:耳語(yǔ)數(shù)據(jù)庫(kù)的不完善,對(duì)于正常語(yǔ)音,美國(guó)國(guó)家標(biāo)準(zhǔn)技術(shù)局給出了統(tǒng)一的數(shù)據(jù)庫(kù)資源用于開展說話人識(shí)別研究,而耳語(yǔ)音在這方面的資源較為匱乏;耳語(yǔ)音特征表達(dá)問題,耳語(yǔ)音由于其發(fā)音的特殊性,有些常用的特征參數(shù)無法提取,其頻譜參數(shù)的獲取較正常音也更加困難;耳語(yǔ)音是氣聲發(fā)音,聲級(jí)較低,較易受噪聲干擾,,且耳語(yǔ)音往往在手機(jī)通話時(shí)使用,易受信道環(huán)境影響;同時(shí),耳語(yǔ)發(fā)音時(shí),受發(fā)音場(chǎng)所制約,情感表達(dá)受限,且發(fā)音狀態(tài)、心理因素都會(huì)產(chǎn)生一定的變化,更易受到說話人心理因素、情緒及發(fā)音狀態(tài)的影響。因此,較之正常發(fā)音,耳語(yǔ)發(fā)音方式下說話人識(shí)別面臨的主要難點(diǎn)是:特征參數(shù)更難提取,易受說話人自身狀態(tài)影響,以及對(duì)信道變化更加敏感等。 針對(duì)這些問題,本文開展了以下幾個(gè)方面的工作: 1.提出了反映耳語(yǔ)音說話人特征的參數(shù)提取算法。耳語(yǔ)音無基頻、聲源特征難以體現(xiàn),作為表征聲道特性的共振峰參數(shù),其提取算法的可靠性顯得尤為重要。本文提出了基于頻譜分段的耳語(yǔ)音共振峰提取算法,該方法可動(dòng)態(tài)地進(jìn)行頻譜分段,通過選擇性線性預(yù)測(cè)獲得濾波器參數(shù),采用并聯(lián)的逆濾波控制得到共振峰。該方法為解決因耳語(yǔ)發(fā)音導(dǎo)致的共振峰偏移、合并、平坦等問題提供了有效途徑。另一方面,本文依據(jù)變量統(tǒng)計(jì)里中心與平坦度可衡量信號(hào)穩(wěn)定性的特點(diǎn),結(jié)合人耳聽覺模型,提出了Bark子帶譜中心與Bark子帶譜平坦度的概念,與其他頻譜變量組成特征參數(shù)集,可有效表征耳語(yǔ)發(fā)音方式下說話人特征。 2.提出了基于特征映射及說話人模型合成的非典型情緒下耳語(yǔ)說話人識(shí)別方法。較好地解決訓(xùn)練語(yǔ)音與測(cè)試語(yǔ)音說話人情緒狀態(tài)失配的問題。由于耳語(yǔ)音在情緒表達(dá)方面不如正常音有效,無法明晰地進(jìn)行情感分類,本文通過耳語(yǔ)音說話人狀態(tài)的A、V因子分類方法,模糊其情感表達(dá)的一一對(duì)應(yīng)性,并在測(cè)試階段,作為語(yǔ)音信號(hào)的前端處理手段,對(duì)每一段語(yǔ)音進(jìn)行說話人狀態(tài)分辨,而后實(shí)現(xiàn)特征域或模型域的補(bǔ)償。實(shí)驗(yàn)表明,基于特征映射及說話人模型合成的說話人狀態(tài)補(bǔ)償方法不僅體現(xiàn)了耳語(yǔ)音的獨(dú)特性,更能有效提高非典型情緒下耳語(yǔ)音說話人識(shí)別的正確率。 3.提出了基于潛因子分析的非典型情緒下耳語(yǔ)說話人識(shí)別方法。為耳語(yǔ)說話人狀態(tài)補(bǔ)償提供了有效的途徑。因子分析不關(guān)注公共因子所代表的具體物理含義,僅是在眾多變量中找出具有代表性的因子,且可通過因子數(shù)目的增減,調(diào)節(jié)算法的復(fù)雜度。根據(jù)潛因子理論,可將耳語(yǔ)音特征超矢量分解為說話人超矢量與說話人狀態(tài)超矢量,通過均衡的訓(xùn)練語(yǔ)音分別估計(jì)說話人與說話人狀態(tài)空間,并在測(cè)試階段,對(duì)每一段語(yǔ)音估計(jì)其說話人因子,進(jìn)而做出判決。潛因子分析方法規(guī)避了測(cè)試環(huán)節(jié)中的說話人狀態(tài)分類,相較于對(duì)分類方法有依賴性的補(bǔ)償算法,可進(jìn)一步提升耳語(yǔ)說話人識(shí)別率。 4.提出了基于聯(lián)合因子分析的多信道下非典型情緒耳語(yǔ)音說話人識(shí)別方法。實(shí)現(xiàn)了耳語(yǔ)音說話人識(shí)別中的信道與說話人狀態(tài)雙重補(bǔ)償。根據(jù)聯(lián)合因子分析的基本概念,可將語(yǔ)音特征超矢量分解為說話人超矢量、說話人狀態(tài)超矢量以及信道超矢量。針對(duì)因耳語(yǔ)音訓(xùn)練數(shù)據(jù)不充分,無法同時(shí)估計(jì)出說話人、說話人狀態(tài)及信道空間的問題,用聯(lián)合因子分析方法,在獲得UBM模型后,計(jì)算語(yǔ)音的Baum-Welch統(tǒng)計(jì)量,并首先估計(jì)說話人空間,而后采用并行模式分別估計(jì)說話人狀態(tài)及信道空間。測(cè)試階段,對(duì)于語(yǔ)音的特征矢量減去信道及說話人狀態(tài)偏移,變換后的特征用于說話人識(shí)別。實(shí)驗(yàn)結(jié)果表明,基于聯(lián)合因子分析的方法可同時(shí)進(jìn)行信道及說話人狀態(tài)補(bǔ)償,相較于其他算法,可獲得更好的識(shí)別效果。
[Abstract]:Speaker recognition, as an important part of biometric recognition, can be widely used in public security and judicature, biomedical engineering, military security system and other fields. With the rapid development of computer and network technology, speaker recognition technology has made great progress. Ear whisper is a special form of voice communication, in many cases Because there is a great difference between the ear and the normal sound, the speaker recognition can not copy the method of the normal speaker recognition in the ear language. There are still many problems to be solved.
In this paper, the research object of the ear language speaker recognition is not related to the text. The problems facing the ear speaker recognition mainly include: the imperfect ear language database, the normal voice, the United States National Standard Technology Bureau, which is used to carry out the speaker recognition research, and the ear is used to carry out the speaker recognition research. The resources of speech are scarce, the problem of ear speech feature expression, the ear speech because of its particularity, some commonly used characteristic parameters can not be extracted, its spectrum parameters are more difficult to obtain than normal sound, ear pronunciation is gas sound, low sound level, easier to be disturbed by noise, and ear speech is often in mobile phone calls. It is easy to be affected by the channel environment; at the same time, when the ear language is pronounced, it is restricted by the place of pronunciation, the expression of emotion is limited, and the state of the pronunciation, the psychological factors will have some changes, and it is more susceptible to the influence of the speaker's psychological factors, emotion and pronunciation state. The point is: the feature parameters are more difficult to extract, and are easily affected by the speaker's own state, and are more sensitive to the channel changes.
In view of these problems, this paper has carried out the following aspects:
1. a parameter extraction algorithm which reflects the characteristics of the speaker's speech speaker is proposed. The ear speech has no fundamental frequency and the sound source features are difficult to embody. As a resonance peak parameter that characterizing the characteristics of the sound channel, the reliability of the extraction algorithm is particularly important. In this paper, a spectral segmentation algorithm for the ear speech resonance peak extraction is proposed. This method can dynamically divide the spectrum. The filter parameters are obtained by selective linear prediction, and the resonant peak is obtained by parallel inverse filtering. This method provides an effective way to solve the problem of resonance peak migration, merger and flatness caused by the ear speech sound. On the other hand, this paper combines the characteristics of the center and flatness of the variable statistics to measure the stability of the signal. The concept of the spectral flatness of the Bark subband spectrum center and the Bark subband spectrum is proposed in the human ear auditory model, and the feature parameter sets are formed with other spectrum variables, which can effectively characterize the speaker's characteristics in the ear speech sound mode.
2. an atypical speech speaker recognition method based on feature mapping and speaker model synthesis is proposed. It can solve the problem of emotional state mismatch between the training speech and the test speech speaker. Because the ear speech is not as effective as the normal sound in emotional expression, it can not make a clear emotional classification. The A, V factor classification method of the speaker state blurs the one-to-one correspondence of its emotional expression, and at the test stage, as the front end processing method of the speech signal, the speaker States each speech state, and then the compensation of the feature domain or the model domain is realized. The experiment shows that the speaker state is based on the feature mapping and the speaker model. The compensation method not only embodies the uniqueness of whispered speech, but also can effectively improve the accuracy of speaker recognition in atypical emotional whispered speech.
3. an ear whisper recognition method based on the latent factor analysis is proposed. It provides an effective way for the ear speaker state compensation. The factor analysis does not pay attention to the specific physical meaning represented by the public factors. It is only to find representative factors in many variables, and can be adjusted and reduced by the number of factors. According to the latent factor theory, the super vector of the ear speech feature can be decomposed into the speaker's super vector and the speaker's state super vector, and the speaker and speaker's state space is estimated by the balanced training speech. In the test stage, the speaker factor is estimated for each speech, and then the decision is made. The submersible factor analysis method is made. Compared with the compensation algorithm which is dependent on the classification method, the speaker recognition rate can be further improved.
4. an untypical emotional ear speaker recognition method based on joint factor analysis is proposed. The dual compensation of the channel and speaker state in the speech speaker recognition is realized. According to the basic concept of the joint factor analysis, the speech feature supervector can be decomposed into the speaker supervector, the speaker state supervector and the speaker state supervector. In order to solve the problem of speaker, speaker state and channel space at the same time, the speaker state and channel space can not be estimated at the same time. After obtaining the UBM model, the Baum-Welch statistics of speech are calculated, the speaker space is estimated and the speaker state is estimated by parallel mode, and then the speaker state is estimated by parallel mode. Channel space. In the test phase, the characteristics of the speech feature vectors are subtracted from the channel and speaker state offset, and the transformed features are used for speaker recognition. The experimental results show that the method based on the joint factor analysis can compensate the channel and speaker state at the same time, and better recognition results can be obtained compared with other algorithms.
【學(xué)位授予單位】:蘇州大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2014
【分類號(hào)】:TN912.34

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 沙丹青,栗學(xué)麗,徐柏齡;耳語(yǔ)音聲調(diào)特征的研究[J];電聲技術(shù);2003年11期

2 郭武;李軼杰;戴禮榮;王仁華;;采用非監(jiān)督得分規(guī)整和因子分析的說話人確認(rèn)[J];電子學(xué)報(bào);2009年04期

3 陳雪勤;趙鶴鳴;;基于聽覺模型的漢語(yǔ)耳語(yǔ)音聲調(diào)檢測(cè)[J];電子學(xué)報(bào);2009年04期

4 潘欣裕;趙鶴鳴;陳雪勤;徐敏;;基于EMD擬合特征的耳語(yǔ)音端點(diǎn)檢測(cè)[J];電子與信息學(xué)報(bào);2008年02期

5 黃程韋;趙艷;金峗;于寅驊;趙力;;實(shí)用語(yǔ)音情感的特征分析與識(shí)別的研究[J];電子與信息學(xué)報(bào);2011年01期

6 趙迎春;張勁松;韓晶晶;任芳;蔡汝剛;;中國(guó)兒童情感評(píng)價(jià)圖片庫(kù)(7~14歲,上海版)的建立[J];中國(guó)兒童保健雜志;2009年03期

7 楊莉莉,李燕,徐柏齡;漢語(yǔ)耳語(yǔ)音庫(kù)的建立與聽覺實(shí)驗(yàn)研究[J];南京大學(xué)學(xué)報(bào)(自然科學(xué)版);2005年03期

8 蔣丹寧;蔡蓮紅;;基于語(yǔ)音聲學(xué)特征的情感信息識(shí)別[J];清華大學(xué)學(xué)報(bào)(自然科學(xué)版);2006年01期

9 茹婷婷;謝湘;;耳語(yǔ)音數(shù)據(jù)庫(kù)的設(shè)計(jì)與采集[J];清華大學(xué)學(xué)報(bào)(自然科學(xué)版);2008年S1期

10 金峗;趙艷;黃程韋;趙力;;耳語(yǔ)音情感數(shù)據(jù)庫(kù)的設(shè)計(jì)與建立[J];聲學(xué)技術(shù);2010年01期



本文編號(hào):2158273

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/wltx/2158273.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶f2a92***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com