基于深度學習的語音識別研究

發(fā)布時間：2018-01-18 03:23

本文關鍵詞：基于深度學習的語音識別研究　出處：《北京郵電大學》2014年碩士論文　論文類型：學位論文

【摘要】：進入移動互聯(lián)網(wǎng)時代,語音識別作為實現(xiàn)人機自由交互的關鍵技術,值得深入研究。同時面對大數(shù)據(jù)的挑戰(zhàn),由于深度學習能夠從海量數(shù)據(jù)中挖掘有效信息,成為模式識別領域的一個研究熱點。以深度學習理論為基礎,對語音識別進行研究具有理論意義和實用價值。深度學習本質上是一種采取多層非線性變換的信息提取技術,通過其層次化的特征結構,從而實現(xiàn)對數(shù)據(jù)間復雜關系的建模。本文首先介紹了語音識別的基本原理及研究現(xiàn)狀,詳細闡明深度學習的基礎理論及其網(wǎng)絡模型,然后著重就如何將深度學習理論更好地應用于語音識別中展開研究。 1、研究了基于深度自動編碼器的聲學特征提取方法良好的聲學特征對于語音識別系統(tǒng)的性能至關重要。本文就深度自動編碼器的基本原理展開,分別從聲學特征預處理、網(wǎng)絡結構包括隱含層層數(shù)和節(jié)點數(shù)以及網(wǎng)絡并行訓練算法等幾個方面作了較深入的探討；在Matlab平臺上構建基于語音特征的自動編碼器,分別利用無監(jiān)督和有監(jiān)督的訓練方式從原始MFCC特征中提取魯棒性更強的語音特征；最后通過HTK語音識別框架對863漢語語音庫進行測試,基于無監(jiān)督和有監(jiān)督提取的新特征和原始特征相比,在詞識別正確率方面分別提高了1.96%和3.53%。 2、研究了基于DNN-HMM的聲學建模方法聲學模型是語音識別系統(tǒng)不可或缺的組成部分。本文通過分析深度神經(jīng)網(wǎng)絡和高斯混合模型在結構和訓練方式的異同,闡述了DNN用于描述HMM狀態(tài)輸出概率分布的可行性；在Kaldi開源語音識別平臺上分別實現(xiàn)了基于GMM-HMM和基于DNN-HMM的聲學模型建模,并在RM語音庫上通過實驗證明了應用DNN-HMM模型比GMM-HMM模型的識別系統(tǒng)在詞識別錯誤率上相對下降30%。
[Abstract]:In the era of mobile Internet, speech recognition is the key technology to realize human-computer free interaction, which is worthy of further study. At the same time, in the face of big data's challenge, because of the deep learning can mine the effective information from the massive data. Based on the theory of deep learning, the research on speech recognition has theoretical significance and practical value. Depth learning is essentially a multi-layer nonlinear transformation of information extraction technology through its hierarchical feature structure. In order to realize the modeling of the complex relationship between the data. Firstly, this paper introduces the basic principle and research status of speech recognition, and expounds the basic theory of depth learning and its network model in detail. Then it focuses on how to better apply depth learning theory to speech recognition. 1. The acoustic feature extraction method based on depth automatic encoder is studied. Good acoustic features are very important to the performance of speech recognition system. In this paper, the basic principle of depth automatic encoder is developed, which is preprocessed from acoustic features. The network structure includes hidden layer number, node number and parallel training algorithm. An automatic encoder based on speech features is constructed on the Matlab platform. The unsupervised and supervised training methods are used to extract the more robust speech features from the original MFCC features. Finally, the HTK speech recognition framework is used to test the 863 Chinese phonetic corpus, which is based on the unsupervised and supervised features compared with the original features. The accuracy of word recognition was increased by 1.96% and 3.53 respectively. 2. The acoustic modeling method based on DNN-HMM is studied. Acoustic model is an indispensable part of speech recognition system. This paper analyzes the similarities and differences between the structure and training methods of the hybrid model of depth neural network and Gao Si. The feasibility of using DNN to describe the probability distribution of HMM state output is expounded. The acoustic model modeling based on GMM-HMM and DNN-HMM is implemented on the Kaldi open source speech recognition platform. It is proved that the error rate of word recognition using DNN-HMM model is 30% lower than that with GMM-HMM model.
【學位授予單位】：北京郵電大學
【學位級別】：碩士
【學位授予年份】：2014
【分類號】：TN912.34

【參考文獻】

相關期刊論文前2條

1 李海峰;李純果;;深度學習結構和算法比較分析[J];河北大學學報(自然科學版);2012年05期

2 余凱;賈磊;陳雨強;徐偉;;深度學習的昨天、今天和明天[J];計算機研究與發(fā)展;2013年09期

相關博士學位論文前2條

1 鄢志杰;聲學模型區(qū)分性訓練及其在自動語音識別中的應用[D];中國科學技術大學;2008年

2 羅恒;基于協(xié)同過濾視角的受限玻爾茲曼機研究[D];上海交通大學;2011年

，

本文編號：1439246

資料下載

論文發(fā)表

本文鏈接：http://sikaile.net/kejilunwen/wltx/1439246.html

上一篇：小波包多級樹模型管道泄漏信號壓縮感知方法
下一篇：天地一體化信息網(wǎng)絡的幾個關鍵問題思考

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于深度學習的語音識別研究