說話人聲紋識別的算法研究
本文選題:說話人識別 + 說話人確認 ; 參考:《浙江大學(xué)》2017年碩士論文
【摘要】:說話人聲紋識別是以聲音作為識別特征的一種身份認證手段,為加快說話人識別在實際商業(yè)中的應(yīng)用,對其技術(shù)的研究與實現(xiàn)具有極其深遠的意義。與文本無關(guān)的說話人確認是說話人識別的研究方向之一。主流算法是基于概率統(tǒng)計模型,在語料充分情況下GMM-UBM(Gaussian Mixture Model-Universal Background Model)模型獲得了 較好的性能,但在噪聲情況和信道失配下,識別性能難以進一步提升。為此提出了總變化因子(i-vector)分析技術(shù),將長短不一的語音映射到低維矢量,在低維矢量中解決信道問題。LDA(Linear Discriminant Analysis)和 PLDA(Probabilistic Linear Discriminant Analysis)是常用的信道補償技術(shù),不過后者常被用來作為打分工具。本文以GMM-UBM模型為基礎(chǔ)研究框架,并進一步研究了基于I-vector和PLDA模型的說話人確認系統(tǒng)。本文主要研究內(nèi)容如下:(1)針對說話人識別在云平臺中的應(yīng)用,提出了基于云平臺的說話人識別系統(tǒng)框架。分析了語音預(yù)處理過程和基于人耳聽覺感知的梅爾倒譜系數(shù)MFCC的特征提取流程。(2)構(gòu)建了基于GMM-UBM模型的說話人識別系統(tǒng)。詳細介紹了 UBM模型的訓(xùn)練過程和MAP自適應(yīng)匹配過程。設(shè)置實驗數(shù)據(jù)庫,探究了 UBM訓(xùn)練說話人個數(shù)、模型高斯元件數(shù)、訓(xùn)練語音長度、測試語音長度、MFCC特征維數(shù)等因素對系統(tǒng)性能的影響。(3)構(gòu)建了基于I-vector和PLDA模型的說話人確認系統(tǒng),對I-vector提取算法和PLDA模型進行了分析。實驗對比不同系統(tǒng)的性能,并探究了 norm變換、I-vector特征維度、PLDA因子維度等因素對系統(tǒng)性能的影響。(4)結(jié)合LDA和WCCN規(guī)整技術(shù)對I-vector進行信道補償和降維,并深入分析了該技術(shù)對實驗結(jié)果的影響。針對LDA分類性能不顯著問題,提出改進的分類算法,并進行實驗驗證。
[Abstract]:Speaker voice-pattern recognition is a means of identity authentication with voice as the recognition feature. In order to speed up the application of speaker recognition in practical business, the research and implementation of its technology is of great significance. Text independent speaker recognition is one of the research directions of speaker recognition. The mainstream algorithm is based on probabilistic statistical model, and the performance of GMM-UBM(Gaussian Mixture Model-Universal Background Model is better in the case of sufficient corpus, but it is difficult to improve the performance of recognition in the case of noise and channel mismatch. In order to solve the channel problem in low dimensional vector, the technique of total change factor i-vector-based analysis and PLDA(Probabilistic Linear Discriminant Analysis) are commonly used channel compensation techniques, in which the speech with different length and length are mapped to the low dimensional vector, and the channel problem is solved by LDAN linear Discriminant analysis (LDAN linear Discriminant analysis) and PLDA(Probabilistic Linear Discriminant Analysis). But the latter are often used as scoring tools. Based on the GMM-UBM model, this paper further studies the speaker confirmation system based on I-vector and PLDA models. The main contents of this paper are as follows: (1) aiming at the application of speaker recognition in cloud platform, a framework of speaker recognition system based on cloud platform is proposed. The speech preprocessing process and the feature extraction process of Mel cepstrum coefficient (MFCC) based on human auditory perception are analyzed. A speaker recognition system based on GMM-UBM model is constructed. The training process of UBM model and the process of MAP adaptive matching are introduced in detail. The experiment database is set up to explore the number of speakers trained by UBM, the number of Gao Si components in model, the length of speech training, The speaker confirmation system based on I-vector and PLDA model is constructed, and the I-vector extraction algorithm and PLDA model are analyzed. The performance of different systems is compared, and the influence of factors such as norm transform I-vector feature dimension and PLDA factor dimension on system performance is explored. The channel compensation and dimensionality reduction of I-vector are combined with LDA and WCCN regularization technology. The effect of the technique on the experimental results is also analyzed. In order to solve the problem that the classification performance of LDA is not significant, an improved classification algorithm is proposed and verified by experiments.
【學(xué)位授予單位】:浙江大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TN912.34
【參考文獻】
相關(guān)期刊論文 前10條
1 王威;胡桂明;楊麗;黃東芳;周楊;;基于譜減法和均勻子帶頻帶方差法的端點檢測[J];電聲技術(shù);2016年05期
2 董胡;;低信噪比環(huán)境下改進的語音端點檢測算法[J];計算機技術(shù)與發(fā)展;2016年03期
3 孫一鳴;吳楊揚;李平;;基于改進雙門限法的語音端點檢測研究[J];長春理工大學(xué)學(xué)報(自然科學(xué)版);2016年01期
4 陳晨;韓紀(jì)慶;;說話人識別方法綜述[J];智能計算機與應(yīng)用;2015年05期
5 李琳;萬麗虹;洪青陽;張君;李明;;基于概率修正PLDA的說話人識別系統(tǒng)[J];天津大學(xué)學(xué)報(自然科學(xué)與工程技術(shù)版);2015年08期
6 邢玉娟;潘穎;曹曉麗;;改進i-向量說話人識別算法研究[J];科學(xué)技術(shù)與工程;2014年34期
7 周國鑫;高勇;;基于GMM-UBM模型的說話人辨識研究[J];無線電工程;2014年12期
8 李鐵軍;苗寧;王娟;耿yN明;;云技術(shù)平臺應(yīng)用研究[J];信息系統(tǒng)工程;2014年09期
9 許云飛;楊海;周若華;顏永紅;;高斯PLDA在說話人確認中的應(yīng)用及其聯(lián)合估計[J];自動化學(xué)報;2014年06期
10 酆勇;李宓;李子明;;文本無關(guān)的說話人識別研究[J];數(shù)字通信;2013年04期
,本文編號:1961701
本文鏈接:http://sikaile.net/kejilunwen/xinxigongchenglunwen/1961701.html