基于神經(jīng)網(wǎng)絡(luò)的嵌入式語音識(shí)別系統(tǒng)研究

發(fā)布時(shí)間：2018-07-28 16:56

【摘要】：語音識(shí)別技術(shù)是指讓機(jī)器通過特定程序?qū)⑷祟愓Z音轉(zhuǎn)變成相應(yīng)文本或命令的技術(shù)。近年來,得益于計(jì)算機(jī)硬件和通信網(wǎng)絡(luò)的飛速發(fā)展,語音識(shí)別技術(shù)的研究取得了許多令人鼓舞的成績(jī),市場(chǎng)上也出現(xiàn)了不少相對(duì)成熟的產(chǎn)品。一種本地識(shí)別和云端技術(shù)的運(yùn)作模式的興起可以解決多年來嵌入式語音識(shí)別系統(tǒng)計(jì)算能力和存儲(chǔ)空間有限的難題,人們可以更加專注于如何更好地提高語音識(shí)別系統(tǒng)的準(zhǔn)確率。一直以來,一些經(jīng)典的識(shí)別算法是以線性系統(tǒng)理論為基礎(chǔ)的,而人的發(fā)音實(shí)際上是一個(gè)復(fù)雜的非線性過程,基于線性系統(tǒng)理論的語音識(shí)別系統(tǒng)在實(shí)際環(huán)境中會(huì)有一定的局限性。本文以提高語音識(shí)別系統(tǒng)的準(zhǔn)確率以及泛化能力為目標(biāo),進(jìn)行了相關(guān)的研究和實(shí)驗(yàn)。語音識(shí)別系統(tǒng)一般包括語音預(yù)處理、特征參數(shù)提取、識(shí)別模型和語音合成等部分。本文首先對(duì)語音識(shí)別技術(shù)的發(fā)展歷史和國(guó)內(nèi)外現(xiàn)狀進(jìn)行介紹,然后對(duì)各環(huán)節(jié)進(jìn)行理論研究和分析,研究從語音采集,預(yù)處理,端點(diǎn)檢測(cè),特征參數(shù)提取,時(shí)間規(guī)整網(wǎng)絡(luò)和語音識(shí)別模型各階段的理論和算法,選用MFCC為語音特征參數(shù),給出一套完整的語音識(shí)別系統(tǒng)的設(shè)計(jì)方案。論文主要專注于識(shí)別模型的選取,通過對(duì)比各種識(shí)別算法,選擇BP神經(jīng)網(wǎng)絡(luò)作為識(shí)別模型的基本單元。針對(duì)語音識(shí)別系統(tǒng)準(zhǔn)確率的問題以及BP神經(jīng)網(wǎng)絡(luò)算法不足之處,引入神經(jīng)網(wǎng)絡(luò)集成理論,為提高集成網(wǎng)絡(luò)中個(gè)體差異性,通過K均值聚類法對(duì)神經(jīng)網(wǎng)絡(luò)集成的網(wǎng)絡(luò)個(gè)體生成部分進(jìn)行改進(jìn),最終將多個(gè)BP網(wǎng)絡(luò)進(jìn)行有效整合構(gòu)建成本文的識(shí)別模型。為驗(yàn)證方法的有效性,分別在matlab平臺(tái)和VC6.0平臺(tái)設(shè)計(jì)與開發(fā)一個(gè)MFCC特征參數(shù)與改進(jìn)BP神經(jīng)網(wǎng)絡(luò)集成的語音別系統(tǒng),通過對(duì)仿真實(shí)驗(yàn)結(jié)果的性能分析和比較,證實(shí)本文方法的有效性。最后論文在對(duì)現(xiàn)在嵌入式系統(tǒng)研究的基礎(chǔ)上,選用目前比較流行的Android手機(jī)操作系統(tǒng),針對(duì)特定的硬件平臺(tái),詳細(xì)介紹Android平臺(tái)的軟件架構(gòu)以及應(yīng)用開發(fā)環(huán)境的搭建流程,成功地在以ARM11為核心的開發(fā)板上定制了Android2.3.4操作系統(tǒng),并最終在該平臺(tái)進(jìn)行了簡(jiǎn)單應(yīng)用。
[Abstract]:Speech recognition is a technology that allows machines to turn human speech into text or commands through specific programs. In recent years, thanks to the rapid development of computer hardware and communication network, the research of speech recognition technology has made many encouraging achievements, and there are many relatively mature products in the market. The rise of a local recognition and cloud operating mode can solve the problem of limited computing power and storage space of embedded speech recognition system for many years, and people can focus more on how to improve the accuracy of speech recognition system. All along, some classical recognition algorithms are based on linear system theory, but human pronunciation is actually a complex nonlinear process, and the speech recognition system based on linear system theory will have some limitations in the actual environment. In order to improve the accuracy and generalization ability of speech recognition system, this paper carries out relevant research and experiments. Speech recognition system includes speech preprocessing, feature extraction, recognition model and speech synthesis. This paper first introduces the development history of speech recognition technology and the present situation at home and abroad, then carries on the theoretical research and the analysis to each link, studies from the speech collection, the preprocessing, the endpoint detection, the characteristic parameter extraction, The theory and algorithm of each stage of time regular network and speech recognition model are discussed. MFCC is selected as the speech feature parameter and a complete design scheme of speech recognition system is given. This paper mainly focuses on the selection of recognition model. By comparing various recognition algorithms, BP neural network is selected as the basic unit of recognition model. Aiming at the problem of accuracy of speech recognition system and the deficiency of BP neural network algorithm, the neural network ensemble theory is introduced to improve the individual difference in the integrated network. The K-means clustering method is used to improve the individual generation of neural network ensemble. Finally, several BP networks are effectively integrated into the recognition model of this paper. In order to verify the effectiveness of the method, a speech discrimination system based on matlab and VC6.0 is designed and developed, which integrates MFCC feature parameters with improved BP neural network. The performance analysis and comparison of the simulation results are carried out. The validity of this method is verified. Finally, on the basis of the research of embedded system, this paper selects the popular Android mobile phone operating system, and introduces the software architecture of Android platform and the construction process of the application development environment in detail for the specific hardware platform. The Android2.3.4 operating system was successfully customized on the development board with ARM11 as the core, and the simple application was finally carried out on the platform.
【學(xué)位授予單位】：廣東工業(yè)大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2012
【分類號(hào)】：TP368.1;TN912.34

【引證文獻(xiàn)】

相關(guān)碩士學(xué)位論文前1條

1 卜學(xué)哲;語音識(shí)別算法在ARM-linux平臺(tái)上的研究與實(shí)現(xiàn)[D];河北科技大學(xué);2013年

，

本文編號(hào)：2150946

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2150946.html

上一篇：基于嵌入式移動(dòng)設(shè)備上的3D應(yīng)用低功耗研究
下一篇：溫度屬性對(duì)存儲(chǔ)設(shè)備可靠性影響的研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于神經(jīng)網(wǎng)絡(luò)的嵌入式語音識(shí)別系統(tǒng)研究