基于神經(jīng)網(wǎng)絡(luò)的嵌入式語音識別系統(tǒng)研究
發(fā)布時間:2018-07-28 16:56
【摘要】:語音識別技術(shù)是指讓機(jī)器通過特定程序?qū)⑷祟愓Z音轉(zhuǎn)變成相應(yīng)文本或命令的技術(shù)。近年來,得益于計(jì)算機(jī)硬件和通信網(wǎng)絡(luò)的飛速發(fā)展,語音識別技術(shù)的研究取得了許多令人鼓舞的成績,市場上也出現(xiàn)了不少相對成熟的產(chǎn)品。一種本地識別和云端技術(shù)的運(yùn)作模式的興起可以解決多年來嵌入式語音識別系統(tǒng)計(jì)算能力和存儲空間有限的難題,人們可以更加專注于如何更好地提高語音識別系統(tǒng)的準(zhǔn)確率。一直以來,一些經(jīng)典的識別算法是以線性系統(tǒng)理論為基礎(chǔ)的,而人的發(fā)音實(shí)際上是一個復(fù)雜的非線性過程,基于線性系統(tǒng)理論的語音識別系統(tǒng)在實(shí)際環(huán)境中會有一定的局限性。本文以提高語音識別系統(tǒng)的準(zhǔn)確率以及泛化能力為目標(biāo),進(jìn)行了相關(guān)的研究和實(shí)驗(yàn)。 語音識別系統(tǒng)一般包括語音預(yù)處理、特征參數(shù)提取、識別模型和語音合成等部分。本文首先對語音識別技術(shù)的發(fā)展歷史和國內(nèi)外現(xiàn)狀進(jìn)行介紹,然后對各環(huán)節(jié)進(jìn)行理論研究和分析,研究從語音采集,預(yù)處理,端點(diǎn)檢測,特征參數(shù)提取,時間規(guī)整網(wǎng)絡(luò)和語音識別模型各階段的理論和算法,選用MFCC為語音特征參數(shù),給出一套完整的語音識別系統(tǒng)的設(shè)計(jì)方案。論文主要專注于識別模型的選取,通過對比各種識別算法,選擇BP神經(jīng)網(wǎng)絡(luò)作為識別模型的基本單元。針對語音識別系統(tǒng)準(zhǔn)確率的問題以及BP神經(jīng)網(wǎng)絡(luò)算法不足之處,引入神經(jīng)網(wǎng)絡(luò)集成理論,為提高集成網(wǎng)絡(luò)中個體差異性,通過K均值聚類法對神經(jīng)網(wǎng)絡(luò)集成的網(wǎng)絡(luò)個體生成部分進(jìn)行改進(jìn),最終將多個BP網(wǎng)絡(luò)進(jìn)行有效整合構(gòu)建成本文的識別模型。 為驗(yàn)證方法的有效性,分別在matlab平臺和VC6.0平臺設(shè)計(jì)與開發(fā)一個MFCC特征參數(shù)與改進(jìn)BP神經(jīng)網(wǎng)絡(luò)集成的語音別系統(tǒng),通過對仿真實(shí)驗(yàn)結(jié)果的性能分析和比較,證實(shí)本文方法的有效性。 最后論文在對現(xiàn)在嵌入式系統(tǒng)研究的基礎(chǔ)上,選用目前比較流行的Android手機(jī)操作系統(tǒng),針對特定的硬件平臺,詳細(xì)介紹Android平臺的軟件架構(gòu)以及應(yīng)用開發(fā)環(huán)境的搭建流程,成功地在以ARM11為核心的開發(fā)板上定制了Android2.3.4操作系統(tǒng),并最終在該平臺進(jìn)行了簡單應(yīng)用。
[Abstract]:Speech recognition is a technology that allows machines to turn human speech into text or commands through specific programs. In recent years, thanks to the rapid development of computer hardware and communication network, the research of speech recognition technology has made many encouraging achievements, and there are many relatively mature products in the market. The rise of a local recognition and cloud operating mode can solve the problem of limited computing power and storage space of embedded speech recognition system for many years, and people can focus more on how to improve the accuracy of speech recognition system. All along, some classical recognition algorithms are based on linear system theory, but human pronunciation is actually a complex nonlinear process, and the speech recognition system based on linear system theory will have some limitations in the actual environment. In order to improve the accuracy and generalization ability of speech recognition system, this paper carries out relevant research and experiments. Speech recognition system includes speech preprocessing, feature extraction, recognition model and speech synthesis. This paper first introduces the development history of speech recognition technology and the present situation at home and abroad, then carries on the theoretical research and the analysis to each link, studies from the speech collection, the preprocessing, the endpoint detection, the characteristic parameter extraction, The theory and algorithm of each stage of time regular network and speech recognition model are discussed. MFCC is selected as the speech feature parameter and a complete design scheme of speech recognition system is given. This paper mainly focuses on the selection of recognition model. By comparing various recognition algorithms, BP neural network is selected as the basic unit of recognition model. Aiming at the problem of accuracy of speech recognition system and the deficiency of BP neural network algorithm, the neural network ensemble theory is introduced to improve the individual difference in the integrated network. The K-means clustering method is used to improve the individual generation of neural network ensemble. Finally, several BP networks are effectively integrated into the recognition model of this paper. In order to verify the effectiveness of the method, a speech discrimination system based on matlab and VC6.0 is designed and developed, which integrates MFCC feature parameters with improved BP neural network. The performance analysis and comparison of the simulation results are carried out. The validity of this method is verified. Finally, on the basis of the research of embedded system, this paper selects the popular Android mobile phone operating system, and introduces the software architecture of Android platform and the construction process of the application development environment in detail for the specific hardware platform. The Android2.3.4 operating system was successfully customized on the development board with ARM11 as the core, and the simple application was finally carried out on the platform.
【學(xué)位授予單位】:廣東工業(yè)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2012
【分類號】:TP368.1;TN912.34
本文編號:2150946
[Abstract]:Speech recognition is a technology that allows machines to turn human speech into text or commands through specific programs. In recent years, thanks to the rapid development of computer hardware and communication network, the research of speech recognition technology has made many encouraging achievements, and there are many relatively mature products in the market. The rise of a local recognition and cloud operating mode can solve the problem of limited computing power and storage space of embedded speech recognition system for many years, and people can focus more on how to improve the accuracy of speech recognition system. All along, some classical recognition algorithms are based on linear system theory, but human pronunciation is actually a complex nonlinear process, and the speech recognition system based on linear system theory will have some limitations in the actual environment. In order to improve the accuracy and generalization ability of speech recognition system, this paper carries out relevant research and experiments. Speech recognition system includes speech preprocessing, feature extraction, recognition model and speech synthesis. This paper first introduces the development history of speech recognition technology and the present situation at home and abroad, then carries on the theoretical research and the analysis to each link, studies from the speech collection, the preprocessing, the endpoint detection, the characteristic parameter extraction, The theory and algorithm of each stage of time regular network and speech recognition model are discussed. MFCC is selected as the speech feature parameter and a complete design scheme of speech recognition system is given. This paper mainly focuses on the selection of recognition model. By comparing various recognition algorithms, BP neural network is selected as the basic unit of recognition model. Aiming at the problem of accuracy of speech recognition system and the deficiency of BP neural network algorithm, the neural network ensemble theory is introduced to improve the individual difference in the integrated network. The K-means clustering method is used to improve the individual generation of neural network ensemble. Finally, several BP networks are effectively integrated into the recognition model of this paper. In order to verify the effectiveness of the method, a speech discrimination system based on matlab and VC6.0 is designed and developed, which integrates MFCC feature parameters with improved BP neural network. The performance analysis and comparison of the simulation results are carried out. The validity of this method is verified. Finally, on the basis of the research of embedded system, this paper selects the popular Android mobile phone operating system, and introduces the software architecture of Android platform and the construction process of the application development environment in detail for the specific hardware platform. The Android2.3.4 operating system was successfully customized on the development board with ARM11 as the core, and the simple application was finally carried out on the platform.
【學(xué)位授予單位】:廣東工業(yè)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2012
【分類號】:TP368.1;TN912.34
【引證文獻(xiàn)】
相關(guān)碩士學(xué)位論文 前1條
1 卜學(xué)哲;語音識別算法在ARM-linux平臺上的研究與實(shí)現(xiàn)[D];河北科技大學(xué);2013年
,本文編號:2150946
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2150946.html
最近更新
教材專著