基于統(tǒng)計模型的語音識別系統(tǒng)研究及DSP實現(xiàn)

發(fā)布時間：2018-06-25 02:30

本文選題：語音識別 + MFCC��；參考：《電子科技大學》2012年碩士論文

【摘要】：語音識別是通過人類說話聲音的各種特征，來辨別人類自然語音的語義，或者用來辨別說話人是誰等。隨著語音識別系統(tǒng)的發(fā)展，語音識別技術被廣泛應用到醫(yī)療、軍事、航空、移動互聯(lián)網等領域。近年來，隨著各項技術的不斷突破，嵌入式語音識別系統(tǒng)發(fā)展得很快，已經在許多消費電子類產品中出現(xiàn)，它深刻地改變了傳統(tǒng)的人機交互模式。識別準確率和魯棒性是語音識別系統(tǒng)的關鍵，本文主要從這兩個角度來研究孤立詞語音識別系統(tǒng)的基本算法和OOV拒識算法的實現(xiàn)，以及系統(tǒng)在DSP平臺上的實現(xiàn)。首先，本文對語音識別系統(tǒng)中基本原理和實現(xiàn)技術進行了詳細的描述，主要討論了語音信號的前端處理，前端處理的重點是端點檢測，提取語音特征參數。然后論述了語音模型的建立與實現(xiàn)，并重點討論了HMM的初始化以及如何合并模板參數。其次，，語音識別系統(tǒng)的識別結果總是難以避免誤識，這會嚴重影響到系統(tǒng)的魯棒性和識別準確率，所以需要拒識OOV語音�？紤]到在嵌入式平臺上系統(tǒng)實現(xiàn)的復雜性和成本，本文選擇了基于后驗概率特征和LVQ的拒識算法來完成拒識，并提出了用于拒識的特征參數，這幾個特征參數能比較好地詮釋OOV與IV在后驗概率上的不同之處。將類標簽和特征參數組成的向量作為輸入向量，輸入到LVQ網絡進行訓練，使得LVQ網絡具有區(qū)分OOV和IV兩個類的能力。最后通過不同輸入向量訓練的網絡以及不同的測試集合來測試系統(tǒng)的拒識能力，并給出系統(tǒng)在不同情況下的IV拒絕率及OOV接受率。結果表明，系統(tǒng)在拒絕約2.6%的IV語音的同時，可以拒絕98%以上的OOV語音。最后，在系統(tǒng)相關的算法在PC平臺上實現(xiàn)后，研究了孤立詞語音識別系統(tǒng)在DSP平臺上的實現(xiàn)。首先研究了DSP平臺的處理器架構、存儲器架構、DSP內部各個芯片之間的連接以及各接口的設置，并特別詳細闡述了音頻處理芯片的使用方法。然后給出了系統(tǒng)軟件的設計流程，并描述了語音識別算法如何從PC平臺移植到DSP平臺。接著，研究了系統(tǒng)的自舉，使得系統(tǒng)能在脫離仿真器和開發(fā)環(huán)境的情況下運行。最終建立了一套基于DSP的通用孤立詞語音識別系統(tǒng)。
[Abstract]:In recent years , with the development of speech recognition system , the speech recognition technology has been widely used in medical , military , aviation , mobile internet , etc . With the development of the speech recognition system , the speech recognition technology has been widely used in medical , military , aviation , mobile internet , etc . In recent years , with the development of various technologies , the embedded speech recognition system has developed rapidly . It has changed the traditional man - machine interaction mode profoundly . The recognition accuracy and robustness are the key of the speech recognition system .

Firstly , the basic principle and realization technology of speech recognition system are described in detail . The front - end processing of the speech signal is mainly discussed . The emphasis of the front - end processing is endpoint detection , and the speech feature parameters are extracted . Then the establishment and implementation of the speech model are discussed , and the initialization of HMM and how to merge the template parameters are discussed .

Secondly , the recognition result of the speech recognition system is always difficult to avoid , which can seriously affect the robustness and the recognition accuracy of the system , so it is necessary to reject the OOV speech . Considering the complexity and cost of the system implementation on the embedded platform , this paper selects the feature parameters based on the posterior probability characteristic and the LVQ , and then inputs to the LVQ network for training so that the LVQ network has the ability to distinguish between OOV and IV . The results show that the system can reject more than 98 % of the OOV speech while rejecting about 2.6 % of the IV voice .

Finally , after the system - related algorithm is implemented on PC platform , the realization of isolated word speech recognition system on DSP platform is studied . Firstly , the processor architecture of DSP platform , the memory architecture , the connection between each chip in DSP and the setting of each interface are discussed . Then , the design flow of the system software is discussed , and how the speech recognition algorithm is transplanted from PC platform to DSP platform is described .
【學位授予單位】：電子科技大學
【學位級別】：碩士
【學位授予年份】：2012
【分類號】：TN912.34;TP368.1

【參考文獻】

相關期刊論文前5條

1 王海青,戴蓓倩,李輝,吳卅建;適用于DSP實現(xiàn)的CDHMM口令式語音識別系統(tǒng)[J];計算機工程與應用;2004年06期

2 梁樹嶺;王朝立;梁振英;杜佳明;;基于LVQ混合網絡的非特定語音識別[J];計算機應用與軟件;2010年12期

3 舒倩;李銀國;;基于MFCC0的語音端點檢測方法[J];通信技術;2007年11期

4 宮曉梅;王懷陽;;噪聲環(huán)境下MFCC特征提取[J];微計算機信息;2007年22期

5 李瑩瑩,王成友,蔡宣平;一種基于后驗概率差值的拒識算法[J];應用聲學;2004年05期

本文編號：2064119

資料下載

論文發(fā)表

本文鏈接：http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2064119.html

上一篇：AD633模擬相乘功能設計與實現(xiàn)
下一篇：脈沖神經P系統(tǒng)并行計算的矩陣表示及GPU實現(xiàn)

論文發(fā)表

·知網|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于統(tǒng)計模型的語音識別系統(tǒng)研究及DSP實現(xiàn)