天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

當(dāng)前位置:主頁 > 科技論文 > 信息工程論文 >

基于MPE藏語拉薩話區(qū)分度聲學(xué)模型研究

發(fā)布時間:2018-09-08 18:42
【摘要】:聲學(xué)模型是語音識別中最關(guān)鍵的問題之一,其準(zhǔn)確性直接影響著語音識別系統(tǒng)的識別效果。如何才能建立較為精確的聲學(xué)模型一直是研究的重點(diǎn)。那么,本文就是以提高聲學(xué)模型參數(shù)的準(zhǔn)確性為目的,通過最小音素錯誤準(zhǔn)則對三音子模型的參數(shù)進(jìn)行估計,進(jìn)而得到具有更好識別效果的聲學(xué)模型。大詞表連續(xù)語音識別關(guān)鍵是要建立和訓(xùn)練能準(zhǔn)確描述聲學(xué)特征的聲學(xué)模型,而選擇哪種訓(xùn)練準(zhǔn)則對于識別率具有很大影響。聲學(xué)模型的訓(xùn)練方法有很多種,傳統(tǒng)的訓(xùn)練方法僅僅可以對模型的內(nèi)部進(jìn)行訓(xùn)練,而模型與模型之間并不能相互區(qū)分。為了解決這個問題,通常采用區(qū)分性訓(xùn)練方法。跟傳統(tǒng)的訓(xùn)練方法比較,較為不同的一點(diǎn)是區(qū)分性訓(xùn)練算法中加入了對模型的邊界信息的考慮,所以能夠訓(xùn)練出識別性能更好的聲學(xué)模型。本文是在藏語拉薩話大詞匯量連續(xù)語音識別系統(tǒng)這個平臺上對聲學(xué)模型的區(qū)分性訓(xùn)練進(jìn)行研究,具體的研究內(nèi)容和創(chuàng)新點(diǎn)如下。本文主要研究了傳統(tǒng)的基于生成性準(zhǔn)則的最大似然估計訓(xùn)練算法和基于區(qū)分性訓(xùn)練準(zhǔn)則的最小音素錯誤(Minimum Phone Error,MPE)訓(xùn)練算法。使用HTK工具搭建了每個訓(xùn)練算法的實(shí)驗(yàn)平臺,建立了基于這兩種方法的藏語拉薩話聲學(xué)模型。本文共進(jìn)行了五個實(shí)驗(yàn),實(shí)驗(yàn)1是通過建模單元的選取實(shí)驗(yàn)得到三音子模型有更好的識別效果;實(shí)驗(yàn)2是通過設(shè)定高斯混合數(shù)的不同來驗(yàn)證;實(shí)驗(yàn)3是通過懲罰因子的設(shè)定,得到需要找到一個臨界值來提高識別效果;實(shí)驗(yàn)4通過Phone Lattice大小的設(shè)定得到要根據(jù)實(shí)際情況設(shè)定;實(shí)驗(yàn)5是是否加入Ⅰ-平滑函數(shù),得到加入了平滑函數(shù)之后,識別效果更好。實(shí)驗(yàn)結(jié)果證明,跟傳統(tǒng)的基于生成性的聲學(xué)模型訓(xùn)練方法相比,最小音素錯誤訓(xùn)練算法提高了音素識別率。相比最大似然估計準(zhǔn)則,單音子的正確識別率提高了 7.15%,三音子的正確識別率提高了7.78%。
[Abstract]:Acoustic model is one of the most important problems in speech recognition, and its accuracy directly affects the recognition effect of speech recognition system. How to establish a more accurate acoustic model has been the focus of research. Therefore, in order to improve the accuracy of the acoustic model parameters, this paper estimates the parameters of the trisyllabic model by using the minimum phoneme error criterion, and then obtains the acoustic model with better recognition effect. The key of large vocabulary continuous speech recognition is to establish and train acoustic models that can accurately describe the acoustic features, and choosing which training criteria have great influence on the recognition rate. There are many kinds of training methods for acoustic model. The traditional training method can only train the interior of the model, but the model and the model can not be distinguished from each other. In order to solve this problem, discriminative training is usually used. Compared with the traditional training method, the discriminative training algorithm takes into account the boundary information of the model, so it can train the acoustic model with better recognition performance. This paper studies the discriminative training of acoustic models on the platform of Tibetan Lhasa large vocabulary continuous speech recognition system. The specific research contents and innovations are as follows. In this paper, the traditional maximum likelihood estimation training algorithm based on generative criterion and the minimum phoneme error (Minimum Phone Error,MPE) training algorithm based on discriminative training criterion are studied. The experimental platform of each training algorithm is built with HTK tool, and the acoustic model of Tibetan Lhasa dialect based on these two methods is established. In this paper, five experiments were carried out. Experiment 1 obtained a better recognition effect of the trisyllabic model through the selection of modeling units; experiment 2 was verified by setting the mixing number of Gao Si; experiment 3 was determined by setting the penalty factor. We need to find a critical value to improve the recognition effect; experiment 4 need to be set according to the actual situation by setting the size of Phone Lattice; experiment 5 whether to add I-smoothing function, after the addition of smoothing function, the recognition effect is better. The experimental results show that the minimum phoneme error training algorithm improves the phoneme recognition rate compared with the traditional acoustic model training method based on generation. Compared with the maximum likelihood estimation criterion, the correct recognition rate of monophones is increased by 7.15, and the correct recognition rate of tri-phonon is increased by 7.78.
【學(xué)位授予單位】:西北民族大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TN912.34

【參考文獻(xiàn)】

相關(guān)期刊論文 前9條

1 王輝;趙悅;劉曉鳳;徐曉娜;周楠;許彥敏;;基于深度特征學(xué)習(xí)的藏語語音識別[J];東北師大學(xué)報(自然科學(xué)版);2015年04期

2 陳斌;牛銅;張連海;李弼程;屈丹;;聲學(xué)模型區(qū)分性訓(xùn)練中的動態(tài)加權(quán)數(shù)據(jù)選取方法[J];自動化學(xué)報;2014年12期

3 裴春寶;;基于藏語拉薩語語音識別中端點(diǎn)監(jiān)測算法的研究[J];西藏大學(xué)學(xué)報(自然科學(xué)版);2014年01期

4 單煜翔;鄧妍;劉加;;一種聯(lián)合語種識別的新型大詞匯量連續(xù)語音識別算法[J];自動化學(xué)報;2012年03期

5 李冠宇;孟猛;;藏語拉薩話大詞表連續(xù)語音識別聲學(xué)模型研究[J];計算機(jī)工程;2012年05期

6 祁均;梁維謙;;區(qū)分性訓(xùn)練算法在英語語音評測中的應(yīng)用[J];電聲技術(shù);2011年08期

7 倪崇嘉;劉文舉;徐波;;漢語大詞匯量連續(xù)語音識別系統(tǒng)研究進(jìn)展[J];中文信息學(xué)報;2009年01期

8 劉加;漢語大詞匯量連續(xù)語音識別系統(tǒng)研究進(jìn)展[J];電子學(xué)報;2000年01期

9 陳方,高升;語音識別技術(shù)及發(fā)展[J];電信科學(xué);1996年10期

,

本文編號:2231368

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/xinxigongchenglunwen/2231368.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶0f0bb***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com