基于統(tǒng)計(jì)模型的語(yǔ)音識(shí)別系統(tǒng)研究及DSP實(shí)現(xiàn)
發(fā)布時(shí)間:2018-06-25 02:30
本文選題:語(yǔ)音識(shí)別 + MFCC; 參考:《電子科技大學(xué)》2012年碩士論文
【摘要】:語(yǔ)音識(shí)別是通過(guò)人類(lèi)說(shuō)話(huà)聲音的各種特征,來(lái)辨別人類(lèi)自然語(yǔ)音的語(yǔ)義,或者用來(lái)辨別說(shuō)話(huà)人是誰(shuí)等。隨著語(yǔ)音識(shí)別系統(tǒng)的發(fā)展,語(yǔ)音識(shí)別技術(shù)被廣泛應(yīng)用到醫(yī)療、軍事、航空、移動(dòng)互聯(lián)網(wǎng)等領(lǐng)域。近年來(lái),隨著各項(xiàng)技術(shù)的不斷突破,嵌入式語(yǔ)音識(shí)別系統(tǒng)發(fā)展得很快,已經(jīng)在許多消費(fèi)電子類(lèi)產(chǎn)品中出現(xiàn),它深刻地改變了傳統(tǒng)的人機(jī)交互模式。識(shí)別準(zhǔn)確率和魯棒性是語(yǔ)音識(shí)別系統(tǒng)的關(guān)鍵,本文主要從這兩個(gè)角度來(lái)研究孤立詞語(yǔ)音識(shí)別系統(tǒng)的基本算法和OOV拒識(shí)算法的實(shí)現(xiàn),以及系統(tǒng)在DSP平臺(tái)上的實(shí)現(xiàn)。 首先,本文對(duì)語(yǔ)音識(shí)別系統(tǒng)中基本原理和實(shí)現(xiàn)技術(shù)進(jìn)行了詳細(xì)的描述,主要討論了語(yǔ)音信號(hào)的前端處理,前端處理的重點(diǎn)是端點(diǎn)檢測(cè),提取語(yǔ)音特征參數(shù)。然后論述了語(yǔ)音模型的建立與實(shí)現(xiàn),并重點(diǎn)討論了HMM的初始化以及如何合并模板參數(shù)。 其次,,語(yǔ)音識(shí)別系統(tǒng)的識(shí)別結(jié)果總是難以避免誤識(shí),這會(huì)嚴(yán)重影響到系統(tǒng)的魯棒性和識(shí)別準(zhǔn)確率,所以需要拒識(shí)OOV語(yǔ)音?紤]到在嵌入式平臺(tái)上系統(tǒng)實(shí)現(xiàn)的復(fù)雜性和成本,本文選擇了基于后驗(yàn)概率特征和LVQ的拒識(shí)算法來(lái)完成拒識(shí),并提出了用于拒識(shí)的特征參數(shù),這幾個(gè)特征參數(shù)能比較好地詮釋OOV與IV在后驗(yàn)概率上的不同之處。將類(lèi)標(biāo)簽和特征參數(shù)組成的向量作為輸入向量,輸入到LVQ網(wǎng)絡(luò)進(jìn)行訓(xùn)練,使得LVQ網(wǎng)絡(luò)具有區(qū)分OOV和IV兩個(gè)類(lèi)的能力。最后通過(guò)不同輸入向量訓(xùn)練的網(wǎng)絡(luò)以及不同的測(cè)試集合來(lái)測(cè)試系統(tǒng)的拒識(shí)能力,并給出系統(tǒng)在不同情況下的IV拒絕率及OOV接受率。結(jié)果表明,系統(tǒng)在拒絕約2.6%的IV語(yǔ)音的同時(shí),可以拒絕98%以上的OOV語(yǔ)音。 最后,在系統(tǒng)相關(guān)的算法在PC平臺(tái)上實(shí)現(xiàn)后,研究了孤立詞語(yǔ)音識(shí)別系統(tǒng)在DSP平臺(tái)上的實(shí)現(xiàn)。首先研究了DSP平臺(tái)的處理器架構(gòu)、存儲(chǔ)器架構(gòu)、DSP內(nèi)部各個(gè)芯片之間的連接以及各接口的設(shè)置,并特別詳細(xì)闡述了音頻處理芯片的使用方法。然后給出了系統(tǒng)軟件的設(shè)計(jì)流程,并描述了語(yǔ)音識(shí)別算法如何從PC平臺(tái)移植到DSP平臺(tái)。接著,研究了系統(tǒng)的自舉,使得系統(tǒng)能在脫離仿真器和開(kāi)發(fā)環(huán)境的情況下運(yùn)行。最終建立了一套基于DSP的通用孤立詞語(yǔ)音識(shí)別系統(tǒng)。
[Abstract]:In recent years , with the development of speech recognition system , the speech recognition technology has been widely used in medical , military , aviation , mobile internet , etc . With the development of the speech recognition system , the speech recognition technology has been widely used in medical , military , aviation , mobile internet , etc . In recent years , with the development of various technologies , the embedded speech recognition system has developed rapidly . It has changed the traditional man - machine interaction mode profoundly . The recognition accuracy and robustness are the key of the speech recognition system .
Firstly , the basic principle and realization technology of speech recognition system are described in detail . The front - end processing of the speech signal is mainly discussed . The emphasis of the front - end processing is endpoint detection , and the speech feature parameters are extracted . Then the establishment and implementation of the speech model are discussed , and the initialization of HMM and how to merge the template parameters are discussed .
Secondly , the recognition result of the speech recognition system is always difficult to avoid , which can seriously affect the robustness and the recognition accuracy of the system , so it is necessary to reject the OOV speech . Considering the complexity and cost of the system implementation on the embedded platform , this paper selects the feature parameters based on the posterior probability characteristic and the LVQ , and then inputs to the LVQ network for training so that the LVQ network has the ability to distinguish between OOV and IV . The results show that the system can reject more than 98 % of the OOV speech while rejecting about 2.6 % of the IV voice .
Finally , after the system - related algorithm is implemented on PC platform , the realization of isolated word speech recognition system on DSP platform is studied . Firstly , the processor architecture of DSP platform , the memory architecture , the connection between each chip in DSP and the setting of each interface are discussed . Then , the design flow of the system software is discussed , and how the speech recognition algorithm is transplanted from PC platform to DSP platform is described .
【學(xué)位授予單位】:電子科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2012
【分類(lèi)號(hào)】:TN912.34;TP368.1
【參考文獻(xiàn)】
相關(guān)期刊論文 前5條
1 王海青,戴蓓倩,李輝,吳卅建;適用于DSP實(shí)現(xiàn)的CDHMM口令式語(yǔ)音識(shí)別系統(tǒng)[J];計(jì)算機(jī)工程與應(yīng)用;2004年06期
2 梁樹(shù)嶺;王朝立;梁振英;杜佳明;;基于LVQ混合網(wǎng)絡(luò)的非特定語(yǔ)音識(shí)別[J];計(jì)算機(jī)應(yīng)用與軟件;2010年12期
3 舒倩;李銀國(guó);;基于MFCC0的語(yǔ)音端點(diǎn)檢測(cè)方法[J];通信技術(shù);2007年11期
4 宮曉梅;王懷陽(yáng);;噪聲環(huán)境下MFCC特征提取[J];微計(jì)算機(jī)信息;2007年22期
5 李瑩瑩,王成友,蔡宣平;一種基于后驗(yàn)概率差值的拒識(shí)算法[J];應(yīng)用聲學(xué);2004年05期
本文編號(hào):2064119
本文鏈接:http://sikaile.net/kejilunwen/jisuanjikexuelunwen/2064119.html
最近更新
教材專(zhuān)著