語音識別技術(shù)在人機交互中的應(yīng)用研究
本文選題:梅爾頻率倒譜系數(shù) + 譜減法; 參考:《北方工業(yè)大學(xué)》2017年碩士論文
【摘要】:本文在探討了語音識別技術(shù)在人機交互中的前景與地位后,深入研究了語音識別技術(shù)的相關(guān)知識。首先,論文研究了語音識別的主要技術(shù)環(huán)節(jié),并針對其中的端點檢測環(huán)節(jié)提出了基于MFCC參數(shù)余弦值的單門限端點檢測方法。其次,為提升特征參數(shù)的質(zhì)量,本文對相關(guān)的語音增強算法進行了研究,提出了融合LMS算法與譜減法的語音增強方法。最后,本文提出了融合上述兩個算法的一種語音識別方法,并且不僅在實驗條件下驗證了這種方法的可靠性,同時應(yīng)用這種方法實現(xiàn)了一款以語音為媒介的交互軟件。本文的主要內(nèi)容及創(chuàng)新點如下:1)端點檢測是語音識別環(huán)節(jié)中最重要的環(huán)節(jié),本文在研究了大量的端點檢測算法后,在基于MFCC歐氏距離的雙門限端點檢測算法的基礎(chǔ)之上,提出了一種基于MFCC參數(shù)余弦值的單門限端點檢測算法——MFCC_COS算法。該算法通過計算MFCC參數(shù)的余弦值來區(qū)分語音段與非語音段,并利用單門限加以判斷。該算法避免了歐氏距離中存在的數(shù)值敏感問題、降低了雙門限值導(dǎo)致誤差變大的概率。該算法實現(xiàn)簡單,在噪聲環(huán)境中表現(xiàn)良好,且伴隨著噪音強度的增加,檢測的準確率也不會降低過快。較傳統(tǒng)的算法相比,顯然具有更好的魯棒性。2)現(xiàn)實環(huán)境中的噪音是無法避免的,在提取特征參數(shù)之前,本文對語音文件進行了語音增強。目的主要是為了在特征提取環(huán)節(jié)得到質(zhì)量更高的特征參數(shù)。在研究了大量的語音增強方法后,提出了融合譜減法與LMS算法的語音增強方法——LMSSS算法。算法消除了譜減法降噪后存在音樂噪聲的問題,也規(guī)避了LMS算法的濾波器延遲問題。實驗表明該方法的去燥效果比單獨使用譜減法或單獨使用LMS算法都要有所提升,且在實驗范圍內(nèi),噪音強度越強,去燥效果的優(yōu)勢越明顯。3)本文在最后部分提出了結(jié)合LMSSS算法與MFCC_COS算法的語音識別方法,實驗表明語音識別的準確率在融合了 LMSSS算法后得到了進一步的提升,且在噪聲環(huán)境中表現(xiàn)出了較好的魯棒性。另外,本文基于這種語音識別方法實現(xiàn)了一款語音識別交互軟件。
[Abstract]:After discussing the prospect and status of speech recognition technology in human-computer interaction, this paper deeply studies the related knowledge of speech recognition technology. Firstly, the main technology of speech recognition is studied, and a single threshold endpoint detection method based on the cosine value of MFCC parameters is proposed for the endpoint detection. Secondly, in order to improve the quality of feature parameters, the related speech enhancement algorithms are studied in this paper, and a speech enhancement method combining LMS algorithm and spectral subtraction is proposed. Finally, a speech recognition method based on the above two algorithms is proposed, which not only verifies the reliability of the method under experimental conditions, but also implements an interactive software based on speech. The main contents and innovations of this paper are as follows: (1) Endpoint detection is the most important link in speech recognition. After studying a large number of endpoint detection algorithms, this paper proposes a two-threshold endpoint detection algorithm based on MFCC Euclidean distance. A single threshold endpoint detection algorithm based on MFCC parameters cosine value is proposed. The algorithm distinguishes speech segment from non-speech segment by calculating the cosine value of MFCC parameter, and uses single threshold to judge. The algorithm avoids the problem of numerical sensitivity in Euclidean distance and reduces the probability that the error will increase due to the double threshold. The algorithm is simple to implement, performs well in noise environment, and with the increase of noise intensity, the detection accuracy will not be reduced too fast. Compared with the traditional algorithm, it is obvious that the noise in the real environment is unavoidable. Before the feature parameters are extracted, the speech file is enhanced in this paper. The aim is to obtain higher quality feature parameters in feature extraction. After studying a large number of speech enhancement methods, a new speech enhancement method, LMSSS, is proposed, which combines spectral subtraction and LMS algorithm. The algorithm eliminates the problem of music noise after noise reduction by spectral subtraction and avoids the filter delay problem of LMS algorithm. The experimental results show that the effect of this method is better than that of using spectral subtraction alone or LMS algorithm alone, and the stronger the noise intensity is in the range of experiment, In the last part of this paper, a speech recognition method combining LMSSS algorithm with MFCC Cats algorithm is proposed. The experimental results show that the accuracy of speech recognition has been further improved after the integration of LMSSS algorithm. And it shows good robustness in noise environment. In addition, this paper implements a speech recognition interactive software based on this speech recognition method.
【學(xué)位授予單位】:北方工業(yè)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TN912.34
【參考文獻】
相關(guān)期刊論文 前10條
1 張君昌;張丹;崔力;;一種魯棒自適應(yīng)閾值的語音端點檢測方法[J];西安電子科技大學(xué)學(xué)報;2015年05期
2 ;LPC parameters substitution for speech information hiding[J];The Journal of China Universities of Posts and Telecommunications;2009年06期
3 歐世峰,趙曉暉,顧海軍;改進的基于信號子空間的多通道語音增強算法[J];電子學(xué)報;2005年10期
4 董志峰;汪增福;;基于動態(tài)MFCC的說話人識別算法[J];模式識別與人工智能;2005年05期
5 張仁志,崔慧娟;基于短時能量的語音端點檢測算法研究[J];電聲技術(shù);2005年07期
6 栗學(xué)麗,丁慧,徐柏齡;基于熵函數(shù)的耳語音聲韻分割法[J];聲學(xué)學(xué)報;2005年01期
7 王作英,肖熙;基于段長分布的HMM語音識別模型[J];電子學(xué)報;2004年01期
8 王讓定;柴佩琪;;一種基于改進譜減法的語音增強方法[J];模式識別與人工智能;2003年02期
9 邵央,劉丙哲,李宗葛;基于MFCC和加權(quán)矢量量化的說話人識別系統(tǒng)[J];計算機工程與應(yīng)用;2002年05期
10 韓紀慶,王承發(fā),呂成國,張磊,任為民,馬永林;噪聲環(huán)境下頑健的語音識別系統(tǒng)[J];電聲技術(shù);2002年01期
相關(guān)碩士學(xué)位論文 前1條
1 金學(xué)驥;語音增強算法的研究與實現(xiàn)[D];浙江大學(xué);2005年
,本文編號:2118444
本文鏈接:http://sikaile.net/kejilunwen/xinxigongchenglunwen/2118444.html