語譜圖傅立葉變換的二字漢語詞匯語音識別
發(fā)布時(shí)間:2018-06-21 06:34
本文選題:語音識別 + 語譜圖。 參考:《東北師范大學(xué)》2017年碩士論文
【摘要】:本文通過對寬窄帶語譜圖進(jìn)行傅立葉變換,得到了頻域圖像分帶投影特征值,并對寬窄帶語譜圖特征進(jìn)行融合,形成一種二字漢語詞匯語音識別算法。該算法不采用以往語音識別算法對語音信號逐幀識別,而是利用語譜圖的整體特性逐詞地進(jìn)行語音整體識別,能夠突顯語音信號的整體時(shí)頻特性。本方法利用語譜圖作為可視化圖像的性質(zhì),借助于圖像識別技術(shù)來進(jìn)行語音識別。因?yàn)檎Z譜圖表征語音特性體現(xiàn)在紋絡(luò)結(jié)構(gòu)上,因此圖像紋絡(luò)結(jié)構(gòu)更容易由圖像的頻域描述,本文對寬窄帶語譜圖進(jìn)行再次傅立葉變換,將其語譜圖圖像空域轉(zhuǎn)換至其圖像頻域,從而對二字漢語詞匯進(jìn)行語音識別。本文主要是通過MATLAB R2013a軟件對算法進(jìn)行研究、編程、仿真和實(shí)現(xiàn)。首先使用CoolEditPro2.0軟件對錄制好的語音樣本進(jìn)行預(yù)處理,并對其進(jìn)行量化歸一。然后使用MATLAB R2013a軟件進(jìn)行編程,通過傅立葉時(shí)頻分析構(gòu)造寬窄帶語譜圖,并對其進(jìn)行再次傅立葉變換,得到的圖像頻域進(jìn)行二進(jìn)倍增寬度分帶行投影和列投影,借助于支持向量機(jī)實(shí)現(xiàn)二字漢語詞匯語音識別。仿真實(shí)驗(yàn)表明:該算法對特定人二字漢語詞匯語音的識別率可達(dá)96.8%,對非特定人二字漢語詞匯語音的識別率可達(dá)98.8%,為解決二字漢語詞匯整體語音識別提供了一種新的研究思路。因?yàn)樾〔ㄗ儞Q是一種時(shí)間窗和頻率窗都可以改變的時(shí)頻分析方法,因此在本文中我們嘗試構(gòu)造小波語譜圖對二字漢語詞匯進(jìn)行語音識別。由于錄制大量樣本的工作較為繁瑣,所以我們嘗試通過單模版實(shí)現(xiàn)對非特定人語音進(jìn)行識別。但在實(shí)際過程中遇到了各種問題,實(shí)驗(yàn)結(jié)果并不理想,后續(xù)仍需做進(jìn)一步研究和討論。
[Abstract]:In this paper, the band projection eigenvalues of the frequency domain images are obtained by Fourier transform, and a two-character Chinese lexical speech recognition algorithm is formed by the fusion of the broad and narrow band spectrum features. This algorithm does not use the previous speech recognition algorithms to recognize the speech signal frame by frame, but uses the whole character of the spectrum map to recognize the speech signal word by word, which can highlight the overall time-frequency characteristic of the speech signal. In this method, the speech spectrum is used as the character of the visual image, and the image recognition technology is used to carry out the speech recognition. Because the speech characteristic of the speech spectrum is reflected in the texture structure, so the image texture is more easily described by the frequency domain of the image. In this paper, the broad and narrow band spectrum image is transformed again by Fourier transform, and the spatial domain of the spectrum image is converted to the frequency domain of the image. Therefore, the speech recognition of two-word Chinese vocabulary is carried out. In this paper, the algorithm is studied, programmed, simulated and implemented by MATLAB R2013a software. Firstly, CoolEditPro2.0 software is used to preprocess the recorded speech sample, and to quantify it. Then using MATLAB R2013a software to program, through Fourier time-frequency analysis to construct the broad narrow band spectrum, and carry on the Fourier transform to it again, the obtained image frequency domain carries on the binary multiplication width banding line projection and the column projection. Second-word Chinese vocabulary speech recognition is realized by support vector machine (SVM). The simulation results show that the recognition rate of the algorithm can reach 96.8 for the specific two-character Chinese vocabulary speech and 98.8 for the non-specific two-character Chinese vocabulary speech, which provides a new research idea for the whole speech recognition of the two-character Chinese vocabulary. Because wavelet transform is a time-frequency analysis method which can be changed both in time window and frequency window, we try to construct wavelet spectrum to recognize two-character Chinese vocabulary in this paper. Because the work of recording a large number of samples is tedious, we try to realize the recognition of independent speech by single template. However, various problems have been encountered in the practical process, the experimental results are not satisfactory, and further research and discussion are needed.
【學(xué)位授予單位】:東北師范大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TN912.34
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 吳迪;趙鶴鳴;陶智;張曉俊;肖仲U,
本文編號:2047602
本文鏈接:http://sikaile.net/kejilunwen/xinxigongchenglunwen/2047602.html
最近更新
教材專著