基于深度自編碼網(wǎng)絡(luò)語(yǔ)音識(shí)別噪聲魯棒性研究
發(fā)布時(shí)間:2018-08-22 19:28
【摘要】:為了解決傳統(tǒng)徑向基(Radial basis function,RBF)神經(jīng)網(wǎng)絡(luò)在語(yǔ)音識(shí)別任務(wù)中基函數(shù)中心值和半徑隨機(jī)初始化的問(wèn)題,從人腦對(duì)語(yǔ)音感知的分層處理機(jī)理出發(fā),提出利用大量無(wú)標(biāo)簽數(shù)據(jù)初始化網(wǎng)絡(luò)參數(shù)的無(wú)監(jiān)督預(yù)訓(xùn)練方式代替?zhèn)鹘y(tǒng)隨機(jī)初始化方法,使用深度自編碼網(wǎng)絡(luò)作為語(yǔ)音識(shí)別的聲學(xué)模型,分析梅爾頻率倒譜系數(shù)(Mel Frequency Cepstrum Coefficient,MFCC)和基于Gammatone聽(tīng)覺(jué)濾波器頻率倒譜系數(shù)(Gammatone Frequency Cepstrum Coefficient,GFCC)下非特定人小詞匯量孤立詞的抗噪性能。實(shí)驗(yàn)結(jié)果表明,深度自編碼網(wǎng)絡(luò)在MFCC特征下較徑向基神經(jīng)網(wǎng)絡(luò)表現(xiàn)出更優(yōu)越的抗噪性能;而與經(jīng)典的MFCC特征相比,GFCC特征在深度自編碼網(wǎng)絡(luò)下平均識(shí)別率相對(duì)提升1.87%。
[Abstract]:In order to solve the problem of random initialization of the center value and radius of the basis function in the speech recognition task based on the traditional radial basis function (Radial basis) function RBF neural network, the mechanism of human brain's hierarchical processing of speech perception is discussed. An unsupervised pretraining method using a large amount of unlabeled data to initialize the network parameters is proposed instead of the traditional random initialization method. The depth self-coding network is used as the acoustic model of speech recognition. The anti-noise performance of isolated words with small vocabulary size is analyzed under Mel frequency cepstrum coefficient (Mel Frequency Cepstrum coefficient) and frequency cepstrum coefficient based on Gammatone audio filter (Gammatone Frequency Cepstrum efficient coefficient (Gammatone Frequency Cepstrum). The experimental results show that the depth self-coding network has better anti-noise performance than the radial basis function neural network under the MFCC feature, and the average recognition rate of the MFCC feature is 1.87% higher than that of the classical MFCC feature.
【作者單位】: 太原理工大學(xué)信息工程學(xué)院;天津大學(xué)計(jì)算機(jī)科學(xué)與技術(shù)學(xué)院;
【基金】:國(guó)家自然科學(xué)基金(No.61371193,No.61303109) 山西省留學(xué)回國(guó)擇優(yōu)資助項(xiàng)目(晉人社廳函[2013]68號(hào)) 山西省自然科學(xué)基金(No.2014021022-6)
【分類號(hào)】:TN912.34
[Abstract]:In order to solve the problem of random initialization of the center value and radius of the basis function in the speech recognition task based on the traditional radial basis function (Radial basis) function RBF neural network, the mechanism of human brain's hierarchical processing of speech perception is discussed. An unsupervised pretraining method using a large amount of unlabeled data to initialize the network parameters is proposed instead of the traditional random initialization method. The depth self-coding network is used as the acoustic model of speech recognition. The anti-noise performance of isolated words with small vocabulary size is analyzed under Mel frequency cepstrum coefficient (Mel Frequency Cepstrum coefficient) and frequency cepstrum coefficient based on Gammatone audio filter (Gammatone Frequency Cepstrum efficient coefficient (Gammatone Frequency Cepstrum). The experimental results show that the depth self-coding network has better anti-noise performance than the radial basis function neural network under the MFCC feature, and the average recognition rate of the MFCC feature is 1.87% higher than that of the classical MFCC feature.
【作者單位】: 太原理工大學(xué)信息工程學(xué)院;天津大學(xué)計(jì)算機(jī)科學(xué)與技術(shù)學(xué)院;
【基金】:國(guó)家自然科學(xué)基金(No.61371193,No.61303109) 山西省留學(xué)回國(guó)擇優(yōu)資助項(xiàng)目(晉人社廳函[2013]68號(hào)) 山西省自然科學(xué)基金(No.2014021022-6)
【分類號(hào)】:TN912.34
【參考文獻(xiàn)】
相關(guān)期刊論文 前6條
1 張曉丹;黃麗霞;張雪英;;關(guān)于在噪聲環(huán)境下語(yǔ)音識(shí)別優(yōu)化研究[J];計(jì)算機(jī)仿真;2016年08期
2 陳夢(mèng)U,
本文編號(hào):2198084
本文鏈接:http://sikaile.net/kejilunwen/xinxigongchenglunwen/2198084.html
最近更新
教材專著