天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

語音身份與內(nèi)容同時(shí)識(shí)別技術(shù)及其應(yīng)用研究

發(fā)布時(shí)間:2018-04-16 11:41

  本文選題:語音內(nèi)容識(shí)別 + 語音身份識(shí)別; 參考:《江南大學(xué)》2015年碩士論文


【摘要】:隨著計(jì)算機(jī)技術(shù)的廣泛應(yīng)用,語音識(shí)別技術(shù)逐漸成為當(dāng)前研究熱點(diǎn)之一。語音是人機(jī)交互中最自然的一種方式,而語音識(shí)別技術(shù)是人機(jī)語音交互的關(guān)鍵所在。對(duì)于特定的應(yīng)用場(chǎng)合,需要同時(shí)識(shí)別語音身份與內(nèi)容,并要求識(shí)別算法適合于嵌入式系統(tǒng),,如車載系統(tǒng)、智能家居等。本文主要研究了語音身份與內(nèi)容同時(shí)識(shí)別技術(shù),并將其應(yīng)用于智能家居環(huán)境下的語音控制系統(tǒng)中。本文主要工作內(nèi)容包括: (1)研究了語音信號(hào)的端點(diǎn)檢測(cè)與特征提取技術(shù),用于完成語音信號(hào)的預(yù)處理。探究了幾種常見的語音自適應(yīng)方法,并深入研究了Herbig等人于2011年提出的語音身份與內(nèi)容同時(shí)識(shí)別機(jī)制,用于實(shí)現(xiàn)語音身份與內(nèi)容同時(shí)識(shí)別。 (2)結(jié)合集成學(xué)習(xí)與語音識(shí)別,實(shí)現(xiàn)了基于Bagging與GMM的語音內(nèi)容識(shí)別方法,從而提高了語音內(nèi)容識(shí)別率與識(shí)別率穩(wěn)定性。針對(duì)資源有限的嵌入式系統(tǒng),基于SQ(Soft Quantization)集成多個(gè)語音內(nèi)容識(shí)別模型,有效的降低了識(shí)別模型的空間復(fù)雜度,使得語音內(nèi)容識(shí)別系統(tǒng)更適用于嵌入式環(huán)境。與利用傳統(tǒng)的投票選擇集成方法相比,該方法在集成模型數(shù)量較少的情況下,還能夠提高語音識(shí)別系統(tǒng)的識(shí)別率與穩(wěn)定性。為了實(shí)現(xiàn)說話者群與語音內(nèi)容同時(shí)識(shí)別,利用SQ集成說話者群模型與語音內(nèi)容識(shí)別模型,實(shí)時(shí)計(jì)算每一幀語音信號(hào)的最優(yōu)解碼器,同時(shí)對(duì)SQ得分最高的模型投票。通過模型的得票率比較完成說話者群識(shí)別,同時(shí)利用最優(yōu)解碼器完成語音內(nèi)容識(shí)別。實(shí)驗(yàn)中,當(dāng)語音內(nèi)容識(shí)別模型的集成數(shù)達(dá)到6個(gè)時(shí),語音內(nèi)容平均識(shí)別率為88%,說話者群平均識(shí)別率為81.56%。實(shí)驗(yàn)結(jié)果證實(shí)了特定應(yīng)用場(chǎng)合下說話者群與語音內(nèi)容同時(shí)識(shí)別的可行性。 (3)本文利用說話者群與語音內(nèi)容同時(shí)識(shí)別算法,實(shí)現(xiàn)了智能家居環(huán)境下的語音身份與內(nèi)容同時(shí)識(shí)別系統(tǒng)。實(shí)驗(yàn)中,當(dāng)語音內(nèi)容識(shí)別模型的集成數(shù)達(dá)到5個(gè)時(shí),語音內(nèi)容識(shí)別率達(dá)到了96.64%,說話者群識(shí)別率為88.24%。實(shí)驗(yàn)結(jié)果表明該方法適用于智能家居環(huán)境下的語音身份與內(nèi)容同時(shí)識(shí)別。
[Abstract]:With the wide application of computer technology, speech recognition technology has gradually become one of the research hotspots.Speech is the most natural way in human-computer interaction, and speech recognition technology is the key of human-computer speech interaction.For specific applications, it is necessary to recognize the voice identity and content simultaneously, and the recognition algorithm is required to be suitable for embedded systems, such as vehicle system, smart home and so on.This paper mainly studies the technology of simultaneous recognition of speech identity and content, and applies it to the speech control system in the environment of smart home.The main contents of this paper are as follows:1) Endpoint detection and feature extraction of speech signal are studied, which is used to preprocess speech signal.This paper probes into several common speech adaptive methods, and deeply studies the simultaneous recognition mechanism of speech identity and content proposed by Herbig et al in 2011, which is used to realize simultaneous recognition of speech identity and content.2) the method of speech content recognition based on Bagging and GMM is realized by integrating integrated learning and speech recognition, which improves the rate of speech content recognition and the stability of recognition rate.For the embedded system with limited resources, multiple speech content recognition models are integrated based on SQ(Soft quantity, which effectively reduces the spatial complexity of the recognition model and makes the speech content recognition system more suitable for embedded environment.Compared with the traditional method of voting selection, this method can improve the recognition rate and stability of speech recognition system under the condition that the number of integrated models is less.In order to realize the simultaneous recognition of speaker group and speech content, the speaker group model and speech content recognition model are integrated by sq, and the optimal decoder of each frame of speech signal is calculated in real time. At the same time, the model with the highest score of sq is voted.The speaker group recognition is completed by comparing the votes of the model and the speech content recognition is accomplished by the optimal decoder.In the experiment, when the number of speech content recognition models reaches 6, the average recognition rate of speech content is 88 and the average recognition rate of speakers is 81.56.The experimental results demonstrate the feasibility of simultaneous recognition of speaker groups and speech content in specific applications.In this paper, the speaker group and speech content simultaneous recognition algorithm is used to realize the simultaneous recognition system of speech identity and content in smart home environment.In the experiment, when the integration number of speech content recognition model reaches 5, the speech content recognition rate reaches 96.64 and the speaker group recognition rate is 88.24.Experimental results show that this method is suitable for simultaneous recognition of speech identity and content in smart home environment.
【學(xué)位授予單位】:江南大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2015
【分類號(hào)】:TN912.34

【二級(jí)參考文獻(xiàn)】

相關(guān)期刊論文 前1條

1 黃昊;郭立;李琳;;基于感知敏感成分劃分的語音時(shí)長(zhǎng)規(guī)整算法[J];數(shù)據(jù)采集與處理;2008年06期



本文編號(hào):1758775

資料下載
論文發(fā)表

本文鏈接:http://sikaile.net/kejilunwen/wltx/1758775.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶13305***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com