基于VC的廣告語音識別系統(tǒng)的設(shè)計研究
發(fā)布時間:2018-03-01 15:36
本文關(guān)鍵詞: 語音識別 特征提取 線性預(yù)測倒譜系數(shù) 梅爾倒頻譜系數(shù) K均值 動態(tài)時間規(guī)整 出處:《南京理工大學(xué)》2007年碩士論文 論文類型:學(xué)位論文
【摘要】: 隨著經(jīng)濟(jì)的發(fā)展,電視廣告成為社會生活中越來越重要的一部分,而其帶來的社會問題也日漸顯著,特別是虛假廣告嚴(yán)重誤導(dǎo)了消費(fèi)者,坑害了廣大人民,因此廣告監(jiān)測成為社會急需處理的問題。 本課題主要研究對廣告語音識別技術(shù)的軟件實(shí)現(xiàn)。基于語音識別的基本原理和過程,介紹了語音端點(diǎn)檢測,語音特征提取,語音建模及模型匹配的基本原理和計算方法。 在廣告語音端點(diǎn)檢測部分,主要介紹了短時能量和短時過零率,并結(jié)合仿真結(jié)果給出了適合本系統(tǒng)的雙門限端點(diǎn)檢測法。 在廣告語音特征提取部分,主要介紹了語音的倒譜以及常用的線性預(yù)測倒譜系數(shù)(LPCC),,梅爾倒頻譜系數(shù)(MFCC)。 在語音建模及匹配部分,為了解決特征參數(shù)數(shù)據(jù)量過大以及廣告音頻中出現(xiàn)的多幀、丟幀等問題,本課題應(yīng)用了K均值聚類,矢量量化技術(shù)和DTW算法。 通過上述的算法,在MATLAB環(huán)境下計算仿真,比較分析了這些算法的特性及參數(shù)選取的方法,給出了一種適合本課題的建模與識別方法。 在上述工作的基礎(chǔ)上,本課題在VC環(huán)境下進(jìn)行了對指定廣告語音的測試實(shí)驗(yàn)。通過實(shí)驗(yàn)表明,該系統(tǒng)對廣告監(jiān)測有著一定的實(shí)用意義。
[Abstract]:With the development of economy, TV advertisement has become an increasingly important part of social life, and the social problems brought by it are becoming more and more obvious, especially the false advertisement has misled consumers seriously and harmed the masses of people. Therefore, advertising monitoring has become a problem urgently needed to be dealt with in society. Based on the basic principle and process of speech recognition, this paper introduces the basic principles and calculation methods of speech endpoint detection, speech feature extraction, speech modeling and model matching. In the part of advertising speech endpoint detection, the short time energy and short time zero crossing rate are mainly introduced, and combined with the simulation results, a double threshold endpoint detection method suitable for this system is presented. In the part of advertising speech feature extraction, the speech cepstrum and the commonly used linear prediction cepstrum coefficients (LPCCX), Mel cepstrum coefficients (MFCCs) are introduced. In the part of speech modeling and matching, the K-means clustering, vector quantization and DTW algorithm are applied in order to solve the problems of excessive data volume of feature parameters and multi-frame and frame loss in advertising audio. Based on the above algorithms, the characteristics of these algorithms and the methods of parameter selection are compared and analyzed under the MATLAB environment. A modeling and identification method suitable for this project is presented. On the basis of the above work, this paper has carried on the test experiment to the designated advertisement voice under the VC environment. The experiment shows that the system has certain practical significance to the advertisement monitoring.
【學(xué)位授予單位】:南京理工大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2007
【分類號】:TN912.34
【引證文獻(xiàn)】
相關(guān)碩士學(xué)位論文 前2條
1 亢明;基于矢量量化的語音識別及全文檢索研究[D];重慶大學(xué);2009年
2 張濤;基于FPGA的小波提升算法語音去噪系統(tǒng)的設(shè)計與實(shí)現(xiàn)[D];廣西師范大學(xué);2012年
本文編號:1552569
本文鏈接:http://sikaile.net/wenyilunwen/guanggaoshejilunwen/1552569.html
最近更新
教材專著