說話人確認中語音段差異建模相關(guān)問題的研究

發(fā)布時間：2018-08-13 18:04

【摘要】：隨著說話人確認技術(shù)發(fā)展至今,復(fù)雜信道下的長時語音段的說話人確認技術(shù)已經(jīng)日趨成熟,為其實用化奠定了基礎(chǔ)。其中,建立在高斯混合模型-通用背景模型的基礎(chǔ)之上的全局段差異建模,建模方式簡便,在應(yīng)用于說話人確認系統(tǒng)的過程中,結(jié)合高效的后端非說話人差異補償模型,如概率線性區(qū)分性分析模型(Probabilistic Linear Discriminant Analysis, PLDA),自提出以來就成為了當(dāng)前說話人領(lǐng)域的主流技術(shù)。全局段差異建模的思想在于將特定的語音段中所包含的異于所有語音段共同包含的共性信息(主要為發(fā)音內(nèi)容),即該語音段的段差異(Session Variability),用一個固定長度的低維向量進行表達,即段差異向量。段差異向量中包含了發(fā)音內(nèi)容以外的差異,包括非說話人差異(主要為信道差異)和說話人差異。在段差異向量之上,需要對其進一步建模消除說話人無關(guān)的差異對說話人辨認的影響,因此,全局段差異建模的核心問題在于提取語音段中的段差異以及在后端對非說話人差異進行補償,從而提取到語音段中有利于說話人判別的說話人信息。本文針對段差異的建模和說話人差異的提取及判別方面進行了一系列的研究,主要內(nèi)容如下：第一,通過全局段差異建模雖然能夠獲得關(guān)于語音段中的段差異的整體上的表達,但是卻忽略了存在于語音段中的細節(jié)性的差異,因此我們提出利用局部段差異建模的方法提取存在于語音段中的不能被全局段差異建模所表達出的局部段差異,用于說話人確認。在局部段差異建模中,我們分別針對每一個高斯和每一個聲學(xué)特征的維度中包含的段差異進行建模提取,即高斯局部差異建模和維度局部差異建模。進一步的,在維度局部差異建模中,我們提出對聲學(xué)特征的維度進行不同方式的綁定,然后再提取綁定的維度組合中的局部段差異。由于局部段差異模型與全局段差異模型分別從不同的角度對語音中的段差異進行建模,二者之間存在一定的互補能力,使得我們可以在系統(tǒng)和模型兩個層面分別融合,從而獲得比單獨的模型更優(yōu)的性能。第二,在全局段差異建模中,當(dāng)用于測試和模型訓(xùn)練的語音段在文本上保持一致時,全局段差異建模對于語音的段差異差異擁有良好的建模能力,如長時語音段的文本無關(guān)以及短時語音段的文本相關(guān)的說話人確認。而當(dāng)存在文本不匹配的問題時,例如針對短時短時語音段的文本無關(guān)的說話人測試中,由于全局段差異建模無法針對文本進行段差異建模,導(dǎo)致說話人的判別受到文本差異的影響。我們在局部段差異建模的思想基礎(chǔ)之上,借助語音識別中的基于深層神經(jīng)網(wǎng)絡(luò)的聲學(xué)模型對語音幀以音素狀態(tài)進行聚類,并以此為基礎(chǔ)提取音素相關(guān)的局部段差異。在該建模方式中,我們分別采用單音素和三音素聲學(xué)模型,在針對不同的音素提取到的局部段差異向量之上,根據(jù)測試用的語音段中包含的音素對局部段差異向量進行挑選并用于說話人判別,由此進行音素相關(guān)的說話人判別,解決短時語音段中的文本不匹配問題。在音素之上,我們進一步探索利用語音識別器的識別結(jié)果,以詞為對象進行段差異提取并判別說話人,完善基于發(fā)音內(nèi)容的段差異提取及說話人判別的研究。第三,當(dāng)前基于全局段差異模型的后端信道補償?shù)闹髁骷夹g(shù)PLDA是一個線性的概率模型,在本文中,我們針對后端信道補償模型進行了一系列的改進研究。首先,對于PLDA,我們提出一個與現(xiàn)有的說話人得分計算模型等價的建立在自適應(yīng)的說話人模型上的得分計算模型,在此模型基礎(chǔ)之上,針對多語音段注冊說話人的任務(wù)中,不同說話人注冊語音段數(shù)量不同以及同一個說話人的不同語音段之間不同程度的重疊的現(xiàn)象,提出利用說話人因子的先驗分布參數(shù)進行模型注冊以取代傳統(tǒng)的后驗分布參數(shù),以此來解決這兩個問題對于得分計算帶來的影響。此外,我們在說話人自適應(yīng)模型的基礎(chǔ)之上,引入信道自適應(yīng),在每一組測試中,將PLDA模型自適應(yīng)到測試語音段的信道空間之中,進而計算得分,通過這種方式,我們可以考慮到不同測試的具體信息,從而提升系統(tǒng)的性能。其次,我們引入深層神經(jīng)網(wǎng)絡(luò)用來提取存在于全局段差異向量中的非線性的深層說話人信息,用于說話人判別,達到改善系統(tǒng)性能的目的。
[Abstract]:With the development of speaker recognition technology, long-term speaker recognition technology in complex channels has become more and more mature, which lays the foundation for its practicality. Among them, global segment difference modeling based on Gaussian mixture model-general background model is simple and convenient, and it can be applied to speaker recognition system. Cheng Zhong combines efficient back-end non-speaker difference compensation models, such as Probabilistic Linear Discriminant Analysis (PLDA), which has become the mainstream technology in the field of speakers since it was proposed. Session Variability is expressed by a fixed length low-dimensional vector, i.e. the segment difference vector. The segment difference vector contains differences beyond the pronunciation content, including non-speaker differences (mainly channel differences) and speaker differences. On the basis of segment difference vector, further modeling is needed to eliminate the influence of Speaker-Independent differences on speaker recognition. Therefore, the key problem of global segment difference modeling is to extract segment differences and compensate for non-speaker differences at the back end, so as to extract speakers who are helpful to speaker discrimination in speech segments. This paper focuses on the modeling of segment differences and the extraction and discrimination of speaker differences. The main contents are as follows: Firstly, global segment difference modeling can obtain the overall expression of segment differences in speech segments, but it neglects the detail differences in speech segments. In this paper, we propose a local segment difference modeling method to extract the local segment difference which can not be expressed by global segment difference modeling for speaker recognition. Gauss Local Difference Modeling and Dimensional Local Difference Modeling. Furthermore, in Dimensional Local Difference Modeling, we propose to bind the dimensions of acoustic features in different ways, and then extract the local segment differences in the bound dimension combination. There is a complementary ability between the two, so that we can fuse them separately at the system and model levels to achieve better performance than a single model. Second, in global segment difference modeling, when the speech segments used for testing and model training are consistent in text, the global segment difference is achieved. Modeling has good modeling ability for speech segment differences, such as text-independent for long speech segments and text-dependent speaker verification for short speech segments. Based on the idea of local segment difference modeling, we use the acoustic model based on deep neural network in speech recognition to cluster the phoneme states of speech frames and extract the phoneme-related local segment differences. In the modeling method, we adopt the mono-phoneme and tri-phoneme acoustic models respectively. On the basis of the local segment difference vectors extracted from different phonemes, we select the local segment difference vectors according to the phonemes contained in the test phonemes and use them to distinguish the speakers. Thus, we can distinguish the speakers with phoneme correlation and solve the problem of short-term speech. On the basis of phonemes, we further explore the use of speech recognition results to extract segment differences and identify speakers, improve the research of segment differences extraction and speaker discrimination based on pronunciation content. Thirdly, the backend channel compensation based on global segment differences model is the main channel. Streaming technology PLDA is a linear probability model. In this paper, we make a series of improvements to the backend channel compensation model. Firstly, for PLDA, we propose an equivalent score calculation model based on adaptive speaker model. Aiming at the phenomenon of different number of speech segments registered by different speakers and different degree of overlap between different speech segments of the same speaker in the task of multi-segment registered speaker, a model registration method based on the prior distribution parameters of speaker factors is proposed to replace the traditional posterior distribution parameters. In addition, on the basis of speaker adaptation model, channel adaptation is introduced. In each group of tests, the PLDA model is adapted to the channel space of the test speech segment, and then the score is calculated. In this way, we can take into account the specific information of different tests, thereby improving the performance. Secondly, we introduce the deep neural network to extract the nonlinear deep speaker information which exists in the global segment difference vector, and use it to discriminate the speaker, so as to improve the performance of the system.
【學(xué)位授予單位】：中國科學(xué)技術(shù)大學(xué)
【學(xué)位級別】：博士
【學(xué)位授予年份】：2016
【分類號】：TN912.34

【相似文獻】

相關(guān)期刊論文前10條

1 謝貴武;楊繼紅;肖勇;閔剛;;基于語音分段的自適應(yīng)時長調(diào)整算法[J];軍事通信技術(shù);2008年02期

2 樊建中;孫晴;楊永杰;;一種智能盲文學(xué)習(xí)機設(shè)計[J];現(xiàn)代電子技術(shù);2010年05期

3 溫洪昌;黃應(yīng)強;傅貴興;;單片機的多段語音組合錄放系統(tǒng)設(shè)計[J];單片機與嵌入式系統(tǒng)應(yīng)用;2011年10期

4 張劍;袁華強;;Rhetorical-State SVM在抽取式語音摘要中的應(yīng)用[J];科學(xué)技術(shù)與工程;2013年21期

5 盧堅 ,毛兵 ,孫正興 ,張福炎;一種改進的基于說話者的語音分割算法[J];軟件學(xué)報;2002年02期

6 章文義,朱杰;幾種無語音檢測噪音估計方法的比較研究[J];計算機工程與設(shè)計;2003年10期

7 林鑫;陳樺;王開志;王繼成;;語音驅(qū)動唇形自動合成算法[J];計算機工程;2007年17期

8 蔡鐵;;基于在線單類支持向量機的自適應(yīng)語音活動檢測[J];深圳信息職業(yè)技術(shù)學(xué)院學(xué)報;2008年02期

9 章釗;郭武;;話者識別中結(jié)合模型和能量的語音激活檢測算法[J];小型微型計算機系統(tǒng);2010年09期

10 朱淑琴,裘雪紅;一種精確檢測語音端點的方法[J];計算機仿真;2005年03期

相關(guān)會議論文前9條

1 田野;王作英;陸大金;;基于韻律結(jié)構(gòu)信息的非語音拒識[A];第六屆全國人機語音通訊學(xué)術(shù)會議論文集[C];2001年

2 徐明;胡瑞敏;黃云森;;基于音素識別的語音評價方法[A];第二屆和諧人機環(huán)境聯(lián)合學(xué)術(shù)會議(HHME2006)——第15屆中國多媒體學(xué)術(shù)會議(NCMT'06)論文集[C];2006年

3 王歡良;韓紀慶;李海峰;王承發(fā);;面向嵌入式應(yīng)用的小詞匯量語音串識別系統(tǒng)[A];第七屆全國人機語音通訊學(xué)術(shù)會議（NCMMSC7）論文集[C];2003年

4 那斯爾江·吐爾遜;吾守爾·斯拉木;麥麥提艾力;;維吾爾語大詞匯量連續(xù)語音識別研究——語音語料庫的建立[A];民族語言文字信息技術(shù)研究——第十一屆全國民族語言文字信息學(xué)術(shù)研討會論文集[C];2007年

5 簡志華;王向文;;考慮幀間信息的語音轉(zhuǎn)換算法[A];浙江省信號處理學(xué)會2012學(xué)術(shù)年會論文集[C];2012年

6 魏維;馬海燕;;一種丟失語音信包重建的新算法[A];通信理論與信號處理新進展——2005年通信理論與信號處理年會論文集[C];2005年

7 陳凡;羅四維;;一個實用語音開發(fā)應(yīng)用系統(tǒng)的設(shè)計與實現(xiàn)[A];第二屆全國人機語音通訊學(xué)術(shù)會議論文集[C];1992年

8 劉紅星;戴蓓劏;陸偉;;基于圖像增強方法的共振峰諧波能量參數(shù)的語音和端點檢測[A];第九屆全國人機語音通訊學(xué)術(shù)會議論文集[C];2007年

9 林愛華;張文俊;王毅敏;;基于肌肉模型的語音驅(qū)動唇形動畫[A];第十三屆全國圖象圖形學(xué)學(xué)術(shù)會議論文集[C];2006年

相關(guān)重要報紙文章前5條

1 atvoc;數(shù)碼語音電路產(chǎn)品概述[N];電子資訊時報;2008年

2 記者李山;德用雙音素改進人工語音表達[N];科技日報;2012年

3 中國科學(xué)院自動化研究所模式識別國家重點實驗室于劍邋陶建華;個性化語音生成技術(shù)面面觀[N];計算機世界;2007年

4 江西林慧勇;語音合成芯片MSM6295及其應(yīng)用[N];電子報;2006年

5 ;與“小超人”對話[N];中國計算機報;2001年

相關(guān)博士學(xué)位論文前10條

1 高偉勛;智能家居環(huán)境中個性化語音生成關(guān)鍵技術(shù)研究[D];東華大學(xué);2015年

2 陳麗萍;說話人確認中語音段差異建模相關(guān)問題的研究[D];中國科學(xué)技術(shù)大學(xué);2016年

3 陶冶;文本語音匹配的研究和應(yīng)用[D];山東大學(xué);2009年

4 何俊;聲紋身份識別中非常態(tài)語音應(yīng)對方法研究[D];華南理工大學(xué);2012年

5 李冬冬;基于拓展和聚類的情感魯棒說話人識別研究[D];浙江大學(xué);2008年

6 雙志偉;個性化語音生成研究[D];中國科學(xué)技術(shù)大學(xué);2011年

7 古今;語音感知認證的關(guān)鍵技術(shù)研究[D];中國科學(xué)技術(shù)大學(xué);2009年

8 彭波;Internet上語音的魯棒性傳輸研究[D];華南理工大學(xué);2001年

9 黃湘松;基于混淆網(wǎng)絡(luò)的漢語語音檢索技術(shù)研究[D];哈爾濱工程大學(xué);2010年

10 應(yīng)娜;基于正弦語音模型的低比特率寬帶語音編碼算法的研究[D];吉林大學(xué);2006年

相關(guān)碩士學(xué)位論文前10條

1 王明明;基于GMM和碼本映射相結(jié)合的語音轉(zhuǎn)換方法研究[D];西安建筑科技大學(xué);2015年

2 印雪晨;宋詞朗讀呼吸信號和韻律時長研究[D];西北民族大學(xué);2015年

3 邱一良;噪聲環(huán)境下的語音檢測方法研究[D];電子科技大學(xué);2015年

4 朱俊梅;基于性別預(yù)分類的年齡自動估計研究[D];江蘇師范大學(xué);2014年

5 張占松;基于DSP的語音干擾方法研究與實現(xiàn)[D];北京交通大學(xué);2016年

6 李鵬;基于系統(tǒng)融合的語音查詢項檢索技術(shù)研究[D];解放軍信息工程大學(xué);2015年

7 趙蓉蓉;基于計算聽覺場景分析的單通道語音盲分離技術(shù)[D];太原理工大學(xué);2016年

8 周慧;基于PAD三維情緒模型的情感語音轉(zhuǎn)換與識別[D];西北師范大學(xué);2009年

9 李塵一;基于聯(lián)合得分的語音置信度評估系統(tǒng)的研究與設(shè)計[D];內(nèi)蒙古大學(xué);2006年

10 朱君波;PCA在語音檢測中的應(yīng)用研究[D];浙江工業(yè)大學(xué);2004年

，

本文編號：2181768

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://sikaile.net/shoufeilunwen/xxkjbs/2181768.html

上一篇：無線傳感器網(wǎng)絡(luò)數(shù)據(jù)融合安全問題的研究
下一篇：光學(xué)頻率梳在光子模擬信號處理中的應(yīng)用研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

說話人確認中語音段差異建模相關(guān)問題的研究