基于MPE藏語拉薩話區(qū)分度聲學(xué)模型研究
[Abstract]:Acoustic model is one of the most important problems in speech recognition, and its accuracy directly affects the recognition effect of speech recognition system. How to establish a more accurate acoustic model has been the focus of research. Therefore, in order to improve the accuracy of the acoustic model parameters, this paper estimates the parameters of the trisyllabic model by using the minimum phoneme error criterion, and then obtains the acoustic model with better recognition effect. The key of large vocabulary continuous speech recognition is to establish and train acoustic models that can accurately describe the acoustic features, and choosing which training criteria have great influence on the recognition rate. There are many kinds of training methods for acoustic model. The traditional training method can only train the interior of the model, but the model and the model can not be distinguished from each other. In order to solve this problem, discriminative training is usually used. Compared with the traditional training method, the discriminative training algorithm takes into account the boundary information of the model, so it can train the acoustic model with better recognition performance. This paper studies the discriminative training of acoustic models on the platform of Tibetan Lhasa large vocabulary continuous speech recognition system. The specific research contents and innovations are as follows. In this paper, the traditional maximum likelihood estimation training algorithm based on generative criterion and the minimum phoneme error (Minimum Phone Error,MPE) training algorithm based on discriminative training criterion are studied. The experimental platform of each training algorithm is built with HTK tool, and the acoustic model of Tibetan Lhasa dialect based on these two methods is established. In this paper, five experiments were carried out. Experiment 1 obtained a better recognition effect of the trisyllabic model through the selection of modeling units; experiment 2 was verified by setting the mixing number of Gao Si; experiment 3 was determined by setting the penalty factor. We need to find a critical value to improve the recognition effect; experiment 4 need to be set according to the actual situation by setting the size of Phone Lattice; experiment 5 whether to add I-smoothing function, after the addition of smoothing function, the recognition effect is better. The experimental results show that the minimum phoneme error training algorithm improves the phoneme recognition rate compared with the traditional acoustic model training method based on generation. Compared with the maximum likelihood estimation criterion, the correct recognition rate of monophones is increased by 7.15, and the correct recognition rate of tri-phonon is increased by 7.78.
【學(xué)位授予單位】:西北民族大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TN912.34
【參考文獻(xiàn)】
相關(guān)期刊論文 前9條
1 王輝;趙悅;劉曉鳳;徐曉娜;周楠;許彥敏;;基于深度特征學(xué)習(xí)的藏語語音識別[J];東北師大學(xué)報(自然科學(xué)版);2015年04期
2 陳斌;牛銅;張連海;李弼程;屈丹;;聲學(xué)模型區(qū)分性訓(xùn)練中的動態(tài)加權(quán)數(shù)據(jù)選取方法[J];自動化學(xué)報;2014年12期
3 裴春寶;;基于藏語拉薩語語音識別中端點(diǎn)監(jiān)測算法的研究[J];西藏大學(xué)學(xué)報(自然科學(xué)版);2014年01期
4 單煜翔;鄧妍;劉加;;一種聯(lián)合語種識別的新型大詞匯量連續(xù)語音識別算法[J];自動化學(xué)報;2012年03期
5 李冠宇;孟猛;;藏語拉薩話大詞表連續(xù)語音識別聲學(xué)模型研究[J];計算機(jī)工程;2012年05期
6 祁均;梁維謙;;區(qū)分性訓(xùn)練算法在英語語音評測中的應(yīng)用[J];電聲技術(shù);2011年08期
7 倪崇嘉;劉文舉;徐波;;漢語大詞匯量連續(xù)語音識別系統(tǒng)研究進(jìn)展[J];中文信息學(xué)報;2009年01期
8 劉加;漢語大詞匯量連續(xù)語音識別系統(tǒng)研究進(jìn)展[J];電子學(xué)報;2000年01期
9 陳方,高升;語音識別技術(shù)及發(fā)展[J];電信科學(xué);1996年10期
,本文編號:2231368
本文鏈接:http://sikaile.net/kejilunwen/xinxigongchenglunwen/2231368.html