基于SVM的文本無關(guān)的說話人辨認(rèn)技術(shù)研究
發(fā)布時間:2018-05-21 02:32
本文選題:說話人識別 + 高斯混合模型; 參考:《南京郵電大學(xué)》2017年碩士論文
【摘要】:語音是人類最有效的交流方式,因為其獨特性使其成為說話人識別技術(shù)的基本依據(jù)。在說話人識別基本框架下,尋找一種區(qū)分性強的說話人個性特征以獲得更高的系統(tǒng)性能是當(dāng)前說話人識別領(lǐng)域的研究熱點。模型選擇和特征提取是說話人識別技術(shù)中重點考慮的問題,在確定了模型選擇的條件下,說話人識別系統(tǒng)性能的好壞就主要決定于選取何種類型的特征參數(shù)。當(dāng)今數(shù)字化時代,尋找一種優(yōu)越的說話人個性特征具有很好的理論研究意義和現(xiàn)實意義。本文的研究目標(biāo)是設(shè)計能夠使說話人識別系統(tǒng)的識別性能提升或系統(tǒng)時間復(fù)雜度降低的語音特征。為此重點研究了GMM Supervector在說話人識別系統(tǒng)中的特性,并在此基礎(chǔ)上提出了重組超矢量,結(jié)合支持向量機的特性分析重組超矢量的可行性;接著研究了近幾年熱門的深度學(xué)習(xí),設(shè)計了一個深度神經(jīng)網(wǎng)絡(luò)來提取說話人語音的瓶頸特征。本文的主要工作和創(chuàng)新如下:(1)本文介紹了說話人識別的基本框架,主要包括語音預(yù)處理方法、特征提取方法和說話人識別模型。詳細(xì)介紹了LPC、MFCC及它們的倒譜特征的提取過程,并分析它們的特性。除此之外,還介紹了模板匹配算法、隱馬爾科夫模型法、矢量量化法、高斯混合模型法、支持向量機法以及深度神經(jīng)網(wǎng)絡(luò)法這幾種經(jīng)典的說話人識別方法。通過前期的研究發(fā)現(xiàn),后三種方法在說話人識別系統(tǒng)中表現(xiàn)相對更佳,所以本文對說話人識別的研究也是基于這三種方法上的。(2)針對傳統(tǒng)超矢量在說話人辨認(rèn)系統(tǒng)中性能表現(xiàn)不夠好的問題,本文提出了基于重組超矢量構(gòu)建文本無關(guān)的GMM-SVM說話人辨認(rèn)系統(tǒng)。重組超矢量充分利用各相鄰高斯分量的均值矢量的高關(guān)聯(lián)性,并且每個高斯分量的均值矢量攜帶足夠的說話人個性信息。重組超矢量能充分反應(yīng)說話人身份的內(nèi)在細(xì)節(jié),更使得系統(tǒng)可以充分利用SVM處理高維小數(shù)據(jù)性能優(yōu)越的特點。實驗結(jié)果表明,重組超矢量的GMM-SVM說話人辨認(rèn)系統(tǒng)與傳統(tǒng)的基于GMMSVM的說話人系統(tǒng)相比,有效的提高了說話人的辨別率,同時大幅度縮短了系統(tǒng)建模的時間。(3)針對傳統(tǒng)特征參數(shù)不能挖掘語音信號深層次結(jié)構(gòu)信息的問題,本文設(shè)計了一個深度神經(jīng)網(wǎng)絡(luò)來提取說話人語音的瓶頸特征,搭建基于DNN-SVM的說話人辨認(rèn)系統(tǒng)。這種特征可以挖掘說話人的深度特性,具有不變性和高區(qū)分性的特點。實驗結(jié)果表明,基于DNN-SVM的說話人辨認(rèn)系統(tǒng)比基于SVM的說話人辨認(rèn)系統(tǒng)的識別性能有了明顯的提高。
[Abstract]:Speech is the most effective way of communication, because of its uniqueness, it becomes the basic basis of speaker recognition technology. Under the basic framework of speaker recognition, it is a hot topic in the field of speaker recognition to find a discriminative speaker personality to achieve higher system performance. Model selection and feature extraction are important issues in speaker recognition technology. Under the condition of model selection, the performance of speaker recognition system is mainly determined by which type of feature parameters are selected. In the digital age, it is of great theoretical and practical significance to find a superior speaker personality. The aim of this paper is to design speech features that can improve the recognition performance of speaker recognition systems or reduce the system time complexity. This paper focuses on the characteristics of GMM Supervector in speaker recognition system, and puts forward the recombination supervector, combining the characteristics of support vector machine, analyzes the feasibility of recombination supervector, and then studies the popular depth learning in recent years. A depth neural network is designed to extract the bottleneck features of speaker speech. The main work and innovation of this paper are as follows: (1) this paper introduces the basic framework of speaker recognition, including speech preprocessing method, feature extraction method and speaker recognition model. The extraction process of LPC-MFCC and its cepstrum features are introduced in detail, and their characteristics are analyzed. In addition, several classical speaker recognition methods, such as template matching algorithm, hidden Markov model method, vector quantization method, Gao Si hybrid model method, support vector machine method and depth neural network method, are also introduced. Through previous studies, it was found that the latter three methods performed better in the speaker recognition system. Therefore, the research of speaker recognition in this paper is also based on the three methods. (2) aiming at the problem that the performance of traditional supervector in speaker recognition system is not good enough, In this paper, a text independent GMM-SVM speaker recognition system based on recombination supervector is proposed. The recombination supervector makes full use of the high correlation of the mean vectors of each adjacent Gao Si component, and the mean vector of each Gao Si component carries sufficient speaker personality information. The recombination supervector can fully reflect the intrinsic details of the speaker's identity and make the system make full use of the superior performance of SVM in dealing with high dimensional and small data. The experimental results show that compared with the traditional speaker recognition system based on GMMSVM, the GMM-SVM speaker recognition system based on recombination supervector can effectively improve the speaker identification rate. At the same time, the time of system modeling is shortened greatly.) aiming at the problem that the traditional feature parameters can not mine the deep structure information of speech signal, a depth neural network is designed to extract the bottleneck feature of speaker speech. The speaker identification system based on DNN-SVM is built. This feature can mine the depth of the speaker and has the characteristics of invariance and high discrimination. The experimental results show that the performance of speaker recognition system based on DNN-SVM is significantly improved than that of speaker recognition system based on SVM.
【學(xué)位授予單位】:南京郵電大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TN912.3;TP18
【參考文獻(xiàn)】
相關(guān)期刊論文 前7條
1 田W,
本文編號:1917352
本文鏈接:http://sikaile.net/kejilunwen/xinxigongchenglunwen/1917352.html
最近更新
教材專著