基于特征子空間量化的文本無關(guān)說話人年齡識別
發(fā)布時間:2018-05-04 09:10
本文選題:特征子空間量化 + MFCC; 參考:《蘇州大學(xué)》2016年碩士論文
【摘要】:說話人年齡識別是指通過語音信號分析,識別出說話人的年齡特征。隨著人機(jī)語音交互技術(shù)的研究不斷深入,其應(yīng)用也越來越廣泛,對交互的自然度要求也隨之提高。說話人年齡識別技術(shù)可以使得系統(tǒng)在人機(jī)交互過程中正確了解說話人的年齡特征,從而自適應(yīng)地提供合理的交互方式,例如合適的音量、語速、語調(diào)、語氣等。該技術(shù)可以廣泛應(yīng)用于自動語音信息查詢、健康護(hù)理、娛樂場合等領(lǐng)域。本文提出一種特征子空間量化(FSSQ,Feature Subspace Quantization)方案進(jìn)行文本無關(guān)的說話人年齡識別,主要思想是通過對同一年齡段說話人語音的聲學(xué)特征空間基于聚類技術(shù)進(jìn)行子空間劃分并對子空間進(jìn)行量化來減小模式類的分布散度,提高總體識別精度。同一年齡段說話人的語音信號首先提取梅爾倒譜參數(shù)(MFCC),然后采用K-Means算法對特征矢量進(jìn)行聚類,完成特征子空間劃分,進(jìn)一步采用LBG算法對每一子空間進(jìn)行量化,形成量化碼本,每一年齡段的語音最終表示為一組量化碼本。年齡識別基于最小平均碼本距離進(jìn)行判決分類。實(shí)驗(yàn)結(jié)果表明,提出的特征子空間量化說話人年齡識別方法相對矢量量化(VQ)和高斯混合模型(GMM)等典型方法具有更好的識別性能,總體集內(nèi)和集外識別率分別達(dá)到了89.8%和58.6%。
[Abstract]:Speaker age identification refers to the identification of the speaker's age characteristics through speech signal analysis. With the continuous development of the research on human computer speech interaction technology, its application is becoming more and more extensive, and the requirement of the nature of the interaction is also increased. The speaker age recognition technology can make the system understand the speaker correctly in the process of human-computer interaction. This technique can be widely used in the fields of automatic voice information query, health care, entertainment and other fields. This paper proposes a FSSQ (Feature Subspace Quantization) scheme for text independence. The main idea of the speaker's age recognition is to reduce the distribution divergence and improve the overall accuracy by quantizing the subspace based on the clustering technique of the acoustic feature space of the speaker's voice in the same age group and quantizing the subspace to improve the overall recognition accuracy. The speech signal of the speaker in the same age segment is first extracted by the Mel cepstrum parameter (MFCC Then, the K-Means algorithm is used to cluster the feature vectors to complete the feature subspace division. The LBG algorithm is used to quantify each subspace, and the quantization codebook is formed. The speech at each age section is finally represented as a set of quantized codebooks. The age recognition is based on the most Xiaoping average distance for the decision classification. The experimental results show that, The characteristic subspace quantization speaker age recognition method has better recognition performance, such as the relative vector quantization (VQ) and the Gauss hybrid model (GMM), and the overall and the collection recognition rates are 89.8% and 58.6%., respectively.
【學(xué)位授予單位】:蘇州大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2016
【分類號】:TN912.34
【參考文獻(xiàn)】
相關(guān)期刊論文 前1條
1 王書詔;邱天爽;;說話人識別研究綜述[J];電聲技術(shù);2007年01期
相關(guān)碩士學(xué)位論文 前3條
1 張偉偉;說話人識別技術(shù)的研究[D];燕山大學(xué);2010年
2 周昆湘;基于矢量量化的與文本無關(guān)的說話人確認(rèn)系統(tǒng)的研究[D];中南大學(xué);2007年
3 黃文輝;基于矢量量化的說話人識別技術(shù)研究[D];西安電子科技大學(xué);2006年
,本文編號:1842454
本文鏈接:http://sikaile.net/kejilunwen/xinxigongchenglunwen/1842454.html
最近更新
教材專著