基于深度學(xué)習(xí)的語音識別研究
發(fā)布時間:2018-01-18 03:23
本文關(guān)鍵詞:基于深度學(xué)習(xí)的語音識別研究 出處:《北京郵電大學(xué)》2014年碩士論文 論文類型:學(xué)位論文
更多相關(guān)文章: 語音識別 深度學(xué)習(xí) 特征提取 聲學(xué)建模 深度神經(jīng)網(wǎng)絡(luò) 深度自動編碼器
【摘要】:進(jìn)入移動互聯(lián)網(wǎng)時代,語音識別作為實現(xiàn)人機(jī)自由交互的關(guān)鍵技術(shù),值得深入研究。同時面對大數(shù)據(jù)的挑戰(zhàn),由于深度學(xué)習(xí)能夠從海量數(shù)據(jù)中挖掘有效信息,成為模式識別領(lǐng)域的一個研究熱點。以深度學(xué)習(xí)理論為基礎(chǔ),對語音識別進(jìn)行研究具有理論意義和實用價值。 深度學(xué)習(xí)本質(zhì)上是一種采取多層非線性變換的信息提取技術(shù),通過其層次化的特征結(jié)構(gòu),從而實現(xiàn)對數(shù)據(jù)間復(fù)雜關(guān)系的建模。本文首先介紹了語音識別的基本原理及研究現(xiàn)狀,詳細(xì)闡明深度學(xué)習(xí)的基礎(chǔ)理論及其網(wǎng)絡(luò)模型,然后著重就如何將深度學(xué)習(xí)理論更好地應(yīng)用于語音識別中展開研究。 1、研究了基于深度自動編碼器的聲學(xué)特征提取方法 良好的聲學(xué)特征對于語音識別系統(tǒng)的性能至關(guān)重要。本文就深度自動編碼器的基本原理展開,分別從聲學(xué)特征預(yù)處理、網(wǎng)絡(luò)結(jié)構(gòu)包括隱含層層數(shù)和節(jié)點數(shù)以及網(wǎng)絡(luò)并行訓(xùn)練算法等幾個方面作了較深入的探討;在Matlab平臺上構(gòu)建基于語音特征的自動編碼器,分別利用無監(jiān)督和有監(jiān)督的訓(xùn)練方式從原始MFCC特征中提取魯棒性更強(qiáng)的語音特征;最后通過HTK語音識別框架對863漢語語音庫進(jìn)行測試,基于無監(jiān)督和有監(jiān)督提取的新特征和原始特征相比,在詞識別正確率方面分別提高了1.96%和3.53%。 2、研究了基于DNN-HMM的聲學(xué)建模方法 聲學(xué)模型是語音識別系統(tǒng)不可或缺的組成部分。本文通過分析深度神經(jīng)網(wǎng)絡(luò)和高斯混合模型在結(jié)構(gòu)和訓(xùn)練方式的異同,闡述了DNN用于描述HMM狀態(tài)輸出概率分布的可行性;在Kaldi開源語音識別平臺上分別實現(xiàn)了基于GMM-HMM和基于DNN-HMM的聲學(xué)模型建模,并在RM語音庫上通過實驗證明了應(yīng)用DNN-HMM模型比GMM-HMM模型的識別系統(tǒng)在詞識別錯誤率上相對下降30%。
[Abstract]:In the era of mobile Internet, speech recognition is the key technology to realize human-computer free interaction, which is worthy of further study. At the same time, in the face of big data's challenge, because of the deep learning can mine the effective information from the massive data. Based on the theory of deep learning, the research on speech recognition has theoretical significance and practical value. Depth learning is essentially a multi-layer nonlinear transformation of information extraction technology through its hierarchical feature structure. In order to realize the modeling of the complex relationship between the data. Firstly, this paper introduces the basic principle and research status of speech recognition, and expounds the basic theory of depth learning and its network model in detail. Then it focuses on how to better apply depth learning theory to speech recognition. 1. The acoustic feature extraction method based on depth automatic encoder is studied. Good acoustic features are very important to the performance of speech recognition system. In this paper, the basic principle of depth automatic encoder is developed, which is preprocessed from acoustic features. The network structure includes hidden layer number, node number and parallel training algorithm. An automatic encoder based on speech features is constructed on the Matlab platform. The unsupervised and supervised training methods are used to extract the more robust speech features from the original MFCC features. Finally, the HTK speech recognition framework is used to test the 863 Chinese phonetic corpus, which is based on the unsupervised and supervised features compared with the original features. The accuracy of word recognition was increased by 1.96% and 3.53 respectively. 2. The acoustic modeling method based on DNN-HMM is studied. Acoustic model is an indispensable part of speech recognition system. This paper analyzes the similarities and differences between the structure and training methods of the hybrid model of depth neural network and Gao Si. The feasibility of using DNN to describe the probability distribution of HMM state output is expounded. The acoustic model modeling based on GMM-HMM and DNN-HMM is implemented on the Kaldi open source speech recognition platform. It is proved that the error rate of word recognition using DNN-HMM model is 30% lower than that with GMM-HMM model.
【學(xué)位授予單位】:北京郵電大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TN912.34
【參考文獻(xiàn)】
相關(guān)期刊論文 前2條
1 李海峰;李純果;;深度學(xué)習(xí)結(jié)構(gòu)和算法比較分析[J];河北大學(xué)學(xué)報(自然科學(xué)版);2012年05期
2 余凱;賈磊;陳雨強(qiáng);徐偉;;深度學(xué)習(xí)的昨天、今天和明天[J];計算機(jī)研究與發(fā)展;2013年09期
相關(guān)博士學(xué)位論文 前2條
1 鄢志杰;聲學(xué)模型區(qū)分性訓(xùn)練及其在自動語音識別中的應(yīng)用[D];中國科學(xué)技術(shù)大學(xué);2008年
2 羅恒;基于協(xié)同過濾視角的受限玻爾茲曼機(jī)研究[D];上海交通大學(xué);2011年
,本文編號:1439246
本文鏈接:http://sikaile.net/kejilunwen/wltx/1439246.html
最近更新
教材專著