自然口語語音識別中的聲學(xué)建模研究
本文關(guān)鍵詞:自然口語語音識別中的聲學(xué)建模研究 出處:《北京理工大學(xué)》2014年博士論文 論文類型:學(xué)位論文
更多相關(guān)文章: 連續(xù)語音識別 聲學(xué)模型 說話人自適應(yīng) 區(qū)分性訓(xùn)練 區(qū)分性線性變換
【摘要】:聲學(xué)建模是語音識別領(lǐng)域中的關(guān)鍵問題之一,其精確性直接影響語音識別系統(tǒng)的性能。如何建立更精確的聲學(xué)模型一直以來都是研究者關(guān)注的重點。本文以提高聲學(xué)模型參數(shù)的準(zhǔn)確性和連續(xù)語音識別系統(tǒng)的性能為主要目的,對聲學(xué)模型訓(xùn)練中狀態(tài)聚類前三音子模型參數(shù)的估計和聲學(xué)模型自適應(yīng)進(jìn)行了研究。 首先,為了提高漢語連續(xù)語音識別中決策樹狀態(tài)聚類的精度,對狀態(tài)聚類前三音子模型的優(yōu)化進(jìn)行了研究。決策樹的構(gòu)建與其所用的三音子模型參數(shù)的準(zhǔn)確度存在密切的關(guān)系。訓(xùn)練語料中存在大量的稀疏三音子,因此在聲學(xué)模型的訓(xùn)練過程中,狀態(tài)聚類前三音子模型的訓(xùn)練存在數(shù)據(jù)稀疏問題。針對此問題,提出采用最大后驗概率(MAP)準(zhǔn)則估計狀態(tài)聚類前三音子的模型參數(shù)。另外,MAP估計對模型的初始參數(shù)要求較高,而僅是音調(diào)不同的帶調(diào)聲韻母三音子集合之間的相似度比只有中心音子相同的帶調(diào)聲韻母三音子集合之間的相似度要高,因此采用無調(diào)聲韻母三音子的模型參數(shù)初始化有調(diào)聲韻母三音子的模型的方法,來提高有調(diào)聲韻母三音子模型的初始參數(shù)的準(zhǔn)確度。通過這些策略,提高了系統(tǒng)的識別性能。 其次,對區(qū)分性最大后驗概率自適應(yīng)進(jìn)行了研究。最小音素錯誤最大后驗概率(MPE-MAP)算法在區(qū)分性訓(xùn)練中融入先驗信息,實現(xiàn)了聲學(xué)模型的區(qū)分性自適應(yīng)。先驗分布中的超參數(shù)的準(zhǔn)確程度對MPE-MAP的性能有很大的影響,針對此情況,分別采用最大互信息最大后驗概率(MMI-MAP)和基于最大互信息準(zhǔn)則與最大似然準(zhǔn)則相結(jié)合的H-criterion最大后驗概率(H-MAP)自適應(yīng)算法估計先驗分布中的超參數(shù),提出了MPE-MMI-MAP和MPE-H-MAP算法。兩種算法通過提高超參數(shù)的準(zhǔn)確度來使自適應(yīng)后的模型得到優(yōu)化,從而提高了自適應(yīng)的性能。 然后,對區(qū)分性線性變換自適應(yīng)進(jìn)行了研究。I-smoothing技術(shù)對區(qū)分性線性變換自適應(yīng)方法非常重要,其通過在區(qū)分性目標(biāo)函數(shù)中加入變換矩陣的對數(shù)先驗分布來實現(xiàn)。本論文在實現(xiàn)區(qū)分性線性變換中的平滑時采用均值的先驗分布,提出了基于均值先驗的平滑方法。如果用最大似然(ML)估計的統(tǒng)計量定義均值先驗分布中的超參數(shù),可以得到和I-smoothing相同的結(jié)果。針對自適應(yīng)情景中數(shù)據(jù)量非常少,采用ML估計的參數(shù)存在準(zhǔn)確度不高的問題,提出采用MAP估計的統(tǒng)計量定義先驗分布中的超參數(shù),使區(qū)分性線性變換在少量自適應(yīng)數(shù)據(jù)的情況下得到性能提升。另外,為了將區(qū)分性和最大后驗概率相結(jié)合,本論文設(shè)計了一個新的目標(biāo)函數(shù)來估計線性變換參數(shù),提出了區(qū)分性最大后驗概率線性回歸自適應(yīng)算法。實驗結(jié)果表明,該算法在少量自適應(yīng)數(shù)據(jù)的情況下可以提高自適應(yīng)的性能,在大量自適應(yīng)數(shù)據(jù)情況下仍能保持區(qū)分性線性變換的性能。 最后,對線性投影(LP)自適應(yīng)方法進(jìn)行了研究。LP函數(shù)對多個初始模型進(jìn)行線性變換,來得到自適應(yīng)后的模型,,可以看作是線性回歸(LR)函數(shù)的擴(kuò)展。本論文提出了基于變換矩陣的LP自適應(yīng)方法,該方法采用說話人自適應(yīng)(SA)模型作為初始模型,并用變換矩陣表示特定人信息。在選擇初始模型時采用了最大似然的方法,以選擇具有最重要信息的模型作為初始模型,減少所要估計的參數(shù)的數(shù)量,從而實現(xiàn)了一種快速自適應(yīng)算法。
[Abstract]:Acoustic modeling is one of the key issues in the field of speech recognition, its accuracy directly affects the performance of the speech recognition system. How to establish a more accurate acoustic model has always been the focus of researchers. In order to improve the accuracy of acoustic model parameters and continuous speech recognition performance system as the main purpose, estimation and acoustic model the adaptive state of acoustic model training in the three tone clustering model parameters are studied.
First of all, in order to improve Chinese continuous speech recognition in decision tree state clustering accuracy was studied to optimize clustering before three triphone models. Close relationship exists to build decision tree model and its parameters of three tone with accuracy. There are a large number of sparse three tone in the training corpus, and so on during the training of acoustic models, clustering before three tone sub models exist in the training data sparseness problem. To solve this problem, the maximum a posteriori (MAP) estimation of model parameters before three triphone state clustering criteria. In addition, the initial MAP estimates the parameters of the model are higher, but only the similarity of tone different tonal vowel sound three tone set between the center tone than only the same tonal vowel sound three tone between sets to be high, so the initial model parameters of unpitched sound tone of the three finals The model of the three tone child of the tone and vowel is used to improve the accuracy of the initial parameters of the three tone sub model with adjustable voice. Through these strategies, the recognition performance of the system is improved.
Secondly, to distinguish the maximum a posteriori probability is studied. The adaptive minimum phone error maximum a posteriori (MPE-MAP) algorithm with prior information in discriminative training, the discriminative adaptive acoustic model. Have a great influence on the performance accuracy of the hyper parameters in the prior distribution of MPE-MAP, in this case respectively, using maximum mutual information and maximum a posteriori (MMI-MAP) and the maximum mutual information criterion based on maximum likelihood criterion and combining the H-criterion maximum a posteriori (H-MAP) estimation adaptive algorithm hyperparameter of the proposed MPE-MMI-MAP and MPE-H-MAP algorithm. Two improved optimization accuracy parameters to adaptive the model by the algorithm, which improves the adaptive performance.
Then, the distinction of.I-smoothing adaptive linear transformation technology is very important to distinguish between linear transform and adaptive method, the logarithmic transformation matrix in the prior distribution of added distinction in the objective function to achieve. This thesis distinguish smooth linear transformation in the prior distribution of the mean, the mean value smoothing method based on prior. If using the maximum likelihood (ML) parameter statistics definition mean distribution a priori estimates of the I-smoothing and can get the same results. According to the data in the context of very small amount of adaptive, using ML to estimate the parameters of the existing accuracy is not high, the super parameter statistics definition MAP estimation of prior distribution in the linear transform. The difference between performance improvement in a small amount of adaptive data. In addition, in order to distinguish and maximum a posteriori probability In combination, this paper designed a new objective function to estimate the parameters of linear transformation, presents a discriminative maximum a posteriori linear regression algorithm. The experimental results show that this algorithm can improve the performance of adaptive adaptive in a small amount of data, can still maintain the performance of discriminative linear transform in adaptive the absence of data.
Finally, the linear projection (LP) adaptive method is used to study the.LP function of a linear transformation of the initial model, to get the adaptive model, can be seen as a linear regression (LR) functions. This paper proposes a LP adaptive method based on the transformation matrix, the method adopts the speaker adaptation (SA) model as the initial model, and indicates the specific information with the transformation matrix. In the selection of the initial model using the maximum likelihood method, is the most important information to select the model as the initial model, reducing the number of parameters to be estimated, so as to achieve a fast adaptive algorithm.
【學(xué)位授予單位】:北京理工大學(xué)
【學(xué)位級別】:博士
【學(xué)位授予年份】:2014
【分類號】:TN912.34
【參考文獻(xiàn)】
相關(guān)期刊論文 前9條
1 張文林;牛銅;張連海;李弼程;;基于最大似然可變子空間的快速說話人自適應(yīng)方法[J];電子與信息學(xué)報;2012年03期
2 倪崇嘉;劉文舉;徐波;;漢語大詞匯量連續(xù)語音識別系統(tǒng)研究進(jìn)展[J];中文信息學(xué)報;2009年01期
3 黃浩;朱杰;哈力旦;;漢語語音識別中的區(qū)分性聲調(diào)建模方法[J];計算機(jī)工程與應(yīng)用;2009年11期
4 郭銳,朱小燕;參數(shù)共享在語音識別中的應(yīng)用[J];清華大學(xué)學(xué)報(自然科學(xué)版);2002年10期
5 李凈,鄭方,張繼勇,吳文虎;漢語連續(xù)語音識別中上下文相關(guān)的聲韻母建模[J];清華大學(xué)學(xué)報(自然科學(xué)版);2004年01期
6 吳華,徐波,黃泰翼;基于三音子模型的語料自動選擇算法[J];軟件學(xué)報;2000年02期
7 鄭方,牟曉隆,徐明星,武健,宋戰(zhàn)江;漢語語音聽寫機(jī)技術(shù)的研究與實現(xiàn)[J];軟件學(xué)報;1999年04期
8 呂萍,王作英,陸大■;基于矩陣線性插值的說話人自適應(yīng)算法[J];清華大學(xué)學(xué)報(自然科學(xué)版);2002年01期
9 ;Speaker Adaptation with Transformation Matrix Linear Interpolation[J];Wuhan University Journal of Natural Sciences;2004年06期
本文編號:1430403
本文鏈接:http://sikaile.net/kejilunwen/wltx/1430403.html