語音身份與內(nèi)容同時識別技術(shù)及其應用研究
發(fā)布時間:2018-04-16 11:41
本文選題:語音內(nèi)容識別 + 語音身份識別; 參考:《江南大學》2015年碩士論文
【摘要】:隨著計算機技術(shù)的廣泛應用,語音識別技術(shù)逐漸成為當前研究熱點之一。語音是人機交互中最自然的一種方式,而語音識別技術(shù)是人機語音交互的關(guān)鍵所在。對于特定的應用場合,需要同時識別語音身份與內(nèi)容,并要求識別算法適合于嵌入式系統(tǒng),,如車載系統(tǒng)、智能家居等。本文主要研究了語音身份與內(nèi)容同時識別技術(shù),并將其應用于智能家居環(huán)境下的語音控制系統(tǒng)中。本文主要工作內(nèi)容包括: (1)研究了語音信號的端點檢測與特征提取技術(shù),用于完成語音信號的預處理。探究了幾種常見的語音自適應方法,并深入研究了Herbig等人于2011年提出的語音身份與內(nèi)容同時識別機制,用于實現(xiàn)語音身份與內(nèi)容同時識別。 (2)結(jié)合集成學習與語音識別,實現(xiàn)了基于Bagging與GMM的語音內(nèi)容識別方法,從而提高了語音內(nèi)容識別率與識別率穩(wěn)定性。針對資源有限的嵌入式系統(tǒng),基于SQ(Soft Quantization)集成多個語音內(nèi)容識別模型,有效的降低了識別模型的空間復雜度,使得語音內(nèi)容識別系統(tǒng)更適用于嵌入式環(huán)境。與利用傳統(tǒng)的投票選擇集成方法相比,該方法在集成模型數(shù)量較少的情況下,還能夠提高語音識別系統(tǒng)的識別率與穩(wěn)定性。為了實現(xiàn)說話者群與語音內(nèi)容同時識別,利用SQ集成說話者群模型與語音內(nèi)容識別模型,實時計算每一幀語音信號的最優(yōu)解碼器,同時對SQ得分最高的模型投票。通過模型的得票率比較完成說話者群識別,同時利用最優(yōu)解碼器完成語音內(nèi)容識別。實驗中,當語音內(nèi)容識別模型的集成數(shù)達到6個時,語音內(nèi)容平均識別率為88%,說話者群平均識別率為81.56%。實驗結(jié)果證實了特定應用場合下說話者群與語音內(nèi)容同時識別的可行性。 (3)本文利用說話者群與語音內(nèi)容同時識別算法,實現(xiàn)了智能家居環(huán)境下的語音身份與內(nèi)容同時識別系統(tǒng)。實驗中,當語音內(nèi)容識別模型的集成數(shù)達到5個時,語音內(nèi)容識別率達到了96.64%,說話者群識別率為88.24%。實驗結(jié)果表明該方法適用于智能家居環(huán)境下的語音身份與內(nèi)容同時識別。
[Abstract]:With the wide application of computer technology, speech recognition technology has gradually become one of the research hotspots.Speech is the most natural way in human-computer interaction, and speech recognition technology is the key of human-computer speech interaction.For specific applications, it is necessary to recognize the voice identity and content simultaneously, and the recognition algorithm is required to be suitable for embedded systems, such as vehicle system, smart home and so on.This paper mainly studies the technology of simultaneous recognition of speech identity and content, and applies it to the speech control system in the environment of smart home.The main contents of this paper are as follows:1) Endpoint detection and feature extraction of speech signal are studied, which is used to preprocess speech signal.This paper probes into several common speech adaptive methods, and deeply studies the simultaneous recognition mechanism of speech identity and content proposed by Herbig et al in 2011, which is used to realize simultaneous recognition of speech identity and content.2) the method of speech content recognition based on Bagging and GMM is realized by integrating integrated learning and speech recognition, which improves the rate of speech content recognition and the stability of recognition rate.For the embedded system with limited resources, multiple speech content recognition models are integrated based on SQ(Soft quantity, which effectively reduces the spatial complexity of the recognition model and makes the speech content recognition system more suitable for embedded environment.Compared with the traditional method of voting selection, this method can improve the recognition rate and stability of speech recognition system under the condition that the number of integrated models is less.In order to realize the simultaneous recognition of speaker group and speech content, the speaker group model and speech content recognition model are integrated by sq, and the optimal decoder of each frame of speech signal is calculated in real time. At the same time, the model with the highest score of sq is voted.The speaker group recognition is completed by comparing the votes of the model and the speech content recognition is accomplished by the optimal decoder.In the experiment, when the number of speech content recognition models reaches 6, the average recognition rate of speech content is 88 and the average recognition rate of speakers is 81.56.The experimental results demonstrate the feasibility of simultaneous recognition of speaker groups and speech content in specific applications.In this paper, the speaker group and speech content simultaneous recognition algorithm is used to realize the simultaneous recognition system of speech identity and content in smart home environment.In the experiment, when the integration number of speech content recognition model reaches 5, the speech content recognition rate reaches 96.64 and the speaker group recognition rate is 88.24.Experimental results show that this method is suitable for simultaneous recognition of speech identity and content in smart home environment.
【學位授予單位】:江南大學
【學位級別】:碩士
【學位授予年份】:2015
【分類號】:TN912.34
【二級參考文獻】
相關(guān)期刊論文 前1條
1 黃昊;郭立;李琳;;基于感知敏感成分劃分的語音時長規(guī)整算法[J];數(shù)據(jù)采集與處理;2008年06期
本文編號:1758775
本文鏈接:http://sikaile.net/kejilunwen/wltx/1758775.html
最近更新
教材專著