當(dāng)前位置：主頁(yè) > 科技論文 > 網(wǎng)絡(luò)通信論文 >

基于神經(jīng)網(wǎng)絡(luò)的多方言口音漢語(yǔ)語(yǔ)音識(shí)別系統(tǒng)研究

發(fā)布時(shí)間：2018-08-20 13:53

【摘要】：語(yǔ)音識(shí)別技術(shù)隨著互聯(lián)網(wǎng)和其他移動(dòng)終端性能的提升，越來(lái)越受到工業(yè)生產(chǎn)和日常生活的青睞，因此如何提供一個(gè)識(shí)別性能高，魯棒性強(qiáng)的語(yǔ)音識(shí)別系統(tǒng)顯得越來(lái)越重要。但是，自從語(yǔ)音識(shí)別技術(shù)應(yīng)用在中文識(shí)別上，就有一個(gè)不能忽視的現(xiàn)象——口音問(wèn)題，它在很大程度上降低了識(shí)別系統(tǒng)的性能，尤其是我國(guó)是一個(gè)多口音地區(qū)國(guó)家，，口音問(wèn)題，尤其是多口音問(wèn)題，成為中文語(yǔ)音識(shí)別的一個(gè)關(guān)鍵問(wèn)題。我國(guó)是個(gè)多口音地區(qū)國(guó)家，除了標(biāo)準(zhǔn)的普通話，還有其它七個(gè)主要方言——官話，吳，粵，湘，客家，閩以及贛。方言與標(biāo)準(zhǔn)普通話差異大，方言之間變化大。生活在這些方言地區(qū)的人們都是把普通話作為第二語(yǔ)言學(xué)習(xí)的，導(dǎo)致一個(gè)最直接的結(jié)果就是生活在這些地區(qū)的人們，在用普通話進(jìn)行表達(dá)的時(shí)候，會(huì)存在很大程度上的口音變化。結(jié)果就是當(dāng)前在標(biāo)準(zhǔn)普通話數(shù)據(jù)上訓(xùn)練的模型并不能針對(duì)特定方言的語(yǔ)音識(shí)別。帶口音的普通話語(yǔ)音識(shí)別的困難主要體現(xiàn)在兩個(gè)方面，因?yàn)槲覈?guó)多口音現(xiàn)象的存在產(chǎn)生了不同口音地區(qū)發(fā)音的變異帶來(lái)的模型不匹配問(wèn)題，同時(shí)導(dǎo)致了訓(xùn)練口音相關(guān)模型和難以獲得大批量的不同口音地區(qū)的語(yǔ)音語(yǔ)料數(shù)據(jù)的矛盾。本文采用口音分類和提升特定口音聲學(xué)模型性能相結(jié)合的方法，針對(duì)這兩個(gè)主要問(wèn)題，不僅能夠?yàn)榈貐^(qū)未知的帶口音普通話測(cè)試數(shù)據(jù)選擇合適的聲學(xué)模型，解決模型不匹配的問(wèn)題，同時(shí)通過(guò)多層級(jí)適應(yīng)性的網(wǎng)絡(luò)（MLAN）提升特定口音的模型識(shí)別能力，能夠進(jìn)一步更好地解決多口音問(wèn)題中的模型不匹配以及特定口音數(shù)據(jù)稀疏導(dǎo)致的的建模難題，進(jìn)而提高識(shí)別率。論文介紹的MLAN系統(tǒng)充分利用了神經(jīng)網(wǎng)絡(luò)的區(qū)分性學(xué)習(xí)能力和交叉數(shù)據(jù)域的適應(yīng)能力，通過(guò)第一級(jí)網(wǎng)絡(luò)，將更大數(shù)據(jù)量的標(biāo)準(zhǔn)普通話數(shù)據(jù)和特定方言口音數(shù)據(jù)的共性適應(yīng)性地被提取，再由第一級(jí)網(wǎng)絡(luò)前饋特定口音數(shù)據(jù)訓(xùn)練第二級(jí)網(wǎng)絡(luò)，并對(duì)標(biāo)準(zhǔn)普通話數(shù)據(jù)前饋，使得標(biāo)準(zhǔn)普通話數(shù)據(jù)被適應(yīng)上了特定口音的個(gè)性特征。這種架構(gòu)不但提升了特定口音數(shù)據(jù)的共性表現(xiàn)能力，也對(duì)大量的標(biāo)準(zhǔn)普通話數(shù)據(jù)進(jìn)行了個(gè)性化的適應(yīng)，極大地增加了含有特定方言口音特征的訓(xùn)練數(shù)據(jù)。目前在廣州、重慶地區(qū)數(shù)據(jù)上的實(shí)驗(yàn)表明：在基線GMM-HMM系統(tǒng)模型的基礎(chǔ)上本文提出的改進(jìn)系統(tǒng)所帶來(lái)的相對(duì)CER下降分別為23.03%和21.21%，性能提升效果相當(dāng)明顯。很好地驗(yàn)證了對(duì)口音未知測(cè)試數(shù)據(jù)進(jìn)行口音分類的必要性和MLAN框架的優(yōu)越性。本文提出這種系統(tǒng)架構(gòu)具有很好的擴(kuò)展性和適應(yīng)性，除了能很好的應(yīng)對(duì)多種口音的語(yǔ)音識(shí)別問(wèn)題，同時(shí)也適合更加復(fù)雜交叉領(lǐng)域和更加細(xì)致分類的情形，比如多種語(yǔ)言，復(fù)雜噪聲條件等。
[Abstract]:With the improvement of the performance of the Internet and other mobile terminals, speech recognition technology is becoming more and more popular in industrial production and daily life. Therefore, how to provide a speech recognition system with high recognition performance and strong robustness is becoming more and more important. However, since the application of speech recognition technology in Chinese recognition, there is a phenomenon that can not be ignored-accent problem, which greatly reduces the performance of the recognition system, especially in China, which is a country with multiple accents. In particular, the problem of multiple accents has become a key issue in Chinese speech recognition. China is a multi-accented country, in addition to standard Mandarin, there are seven other major dialects-Mandarin, Wu, Guangdong, Hunan, Hakka, Fujian and Jiangxi. The difference between dialect and standard Putonghua is great, and the change between dialects is great. People living in these dialects learn Putonghua as a second language, and one of the most direct results is that people living in these areas are speaking in Mandarin. There will be a significant change in accent. The result is that the current model trained on standard Putonghua data is not specific to the speech recognition of a particular dialect. The difficulty of Putonghua speech recognition with accent is mainly reflected in two aspects, because the existence of multi-accent phenomenon in China has resulted in the model mismatch caused by the variation of pronunciation in different accent areas. At the same time, it leads to the contradiction between the relevant models of training accent and the difficulty of obtaining large quantities of speech data from different accent areas. In this paper, we combine accent classification with improving the performance of specific accent acoustic models. Aiming at these two main problems, we can not only select appropriate acoustic models for the unknown area of Mandarin with accent test data, but also improve the performance of specific accent acoustic models. To solve the problem of model mismatch, and improve the model recognition ability of specific accents through multi-level adaptive network (MLAN), It can further solve the modeling problem caused by model mismatch in multi-accent problems and sparse data of specific accents, thus improving the recognition rate. The MLAN system introduced in this paper makes full use of the discriminative learning ability of the neural network and the adaptability of the cross data domain. The generality of standard Putonghua data with larger amount of data and specific dialect accent data is extracted adaptively, and then fed forward by the first level network to specific accent data training the second level network, and feedforward the standard Putonghua data. The standard Putonghua data are adapted to the personality characteristics of specific accents. This architecture not only improves the common expression ability of specific accent data, but also adapts to a large number of standard Putonghua data, and greatly increases the training data with specific dialect accent features. The experiments in Guangzhou and Chongqing show that the relative CER decrease of the improved system based on the baseline GMM-HMM system model is 23.03% and 21.21% respectively, and the performance improvement effect is quite obvious. The necessity of accent classification based on unknown accent test data and the superiority of MLAN framework are well verified. This paper proposes that this architecture has good scalability and adaptability. It can not only deal with speech recognition problems with multiple accents, but also be suitable for more complex cross-domain and more detailed classification situations, such as multi-language. Complex noise conditions, etc.
【學(xué)位授予單位】：中國(guó)科學(xué)院深圳先進(jìn)技術(shù)研究院
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2014
【分類號(hào)】：TP183;TN912.34

【共引文獻(xiàn)】

相關(guān)期刊論文前10條

1 相征;朗朗;王靜;;基于基音頻能值的端點(diǎn)檢測(cè)算法[J];安徽工程科技學(xué)院學(xué)報(bào)(自然科學(xué)版);2008年03期

2 呂軍;馬曉娜;;漢語(yǔ)孤立詞聲韻分割算法的研究[J];安徽師范大學(xué)學(xué)報(bào)(自然科學(xué)版);2008年03期

3 王暉;顏靖華;李傳珍;蔡娟娟;;音頻貝葉斯諧波模型中參數(shù)的提取[J];中國(guó)傳媒大學(xué)學(xué)報(bào)(自然科學(xué)版);2007年04期

4 明生榮;畢節(jié)方言韻母的來(lái)源[J];畢節(jié)師范高等�？茖W(xué)校學(xué)報(bào)(綜合版);2003年01期

5 周長(zhǎng)鋒;韓力群;;概率神經(jīng)網(wǎng)絡(luò)在文本無(wú)關(guān)說(shuō)話人識(shí)別中的應(yīng)用[J];北京工商大學(xué)學(xué)報(bào)(自然科學(xué)版);2007年01期

6 魏麗英;;簡(jiǎn)析語(yǔ)音編碼[J];才智;2010年31期

7 由守杰;柏森;曹巍巍;;魯棒的DCT域音頻盲水印算法[J];重慶郵電大學(xué)學(xué)報(bào)(自然科學(xué)版);2008年02期

8 肖菲;陳賀新;許萬(wàn)里;趙巖;;模式匹配和過(guò)零率檢測(cè)的音頻差錯(cuò)掩蓋[J];吉林大學(xué)學(xué)報(bào)(信息科學(xué)版);2011年01期

9 李雨昕;;基于余弦過(guò)完備原子庫(kù)的語(yǔ)音信號(hào)MP稀疏分解[J];成都電子機(jī)械高等�？茖W(xué)校學(xué)報(bào);2011年02期

10 祝清凱;;論方言差異的形成[J];成都航空職業(yè)技術(shù)學(xué)院學(xué)報(bào);2005年04期

相關(guān)會(huì)議論文前10條

1 徐晨;曹輝;;一種語(yǔ)音信號(hào)生成的數(shù)字模型的研究[A];第二屆西安-上海兩地聲學(xué)學(xué)術(shù)會(huì)議論文集[C];2011年

2 哈妮克孜·伊拉洪;帕力旦·賽力提尼牙孜;那斯?fàn)柦ね聽(tīng)栠d;吾守爾·斯拉木;;維吾爾人說(shuō)漢語(yǔ)普通話發(fā)音特點(diǎn)的聲學(xué)分析[A];第十一屆全國(guó)人機(jī)語(yǔ)音通訊學(xué)術(shù)會(huì)議論文集（一）[C];2011年

3 馮朝斌;呂成國(guó);趙洪剛;;話者識(shí)別系統(tǒng)改進(jìn)策略的研究[A];黑龍江省計(jì)算機(jī)學(xué)會(huì)2009年學(xué)術(shù)交流年會(huì)論文集[C];2010年

4 項(xiàng)良;劉賀平;;G.723.1算法中閉環(huán)基音搜索算法的優(yōu)化[A];冶金企業(yè)自動(dòng)化、信息化與創(chuàng)新——全國(guó)冶金自動(dòng)化信息網(wǎng)建網(wǎng)30周年論文集[C];2007年

5 陳鵬;張仁杰;鄭哲;李杰;;基于ARM的語(yǔ)音識(shí)別家居控制裝置研究[A];第六屆全國(guó)信息獲取與處理學(xué)術(shù)會(huì)議論文集（2）[C];2008年

6 范京;郭司琴;張福學(xué);;微硅陀螺信號(hào)振幅估計(jì)技術(shù)[A];全國(guó)第五屆信號(hào)和智能信息處理與應(yīng)用學(xué)術(shù)會(huì)議專刊(第一冊(cè))[C];2011年

7 陳玉平;韓紀(jì)慶;鄭鐵然;;基于音素模型的語(yǔ)音關(guān)鍵詞檢測(cè)系統(tǒng)[A];全國(guó)網(wǎng)絡(luò)與信息安全技術(shù)研討會(huì)論文集（下冊(cè)）[C];2007年

8 董軍;孫自強(qiáng);;基于ARM和μC/OS-Ⅱ的實(shí)時(shí)語(yǔ)音傳輸技術(shù)研究[A];第八屆工業(yè)儀表與自動(dòng)化學(xué)術(shù)會(huì)議論文集[C];2007年

9 馮曉亮;于水源;;語(yǔ)音識(shí)別中三種基于DTW的模板訓(xùn)練方法的比較[A];第八屆全國(guó)人機(jī)語(yǔ)音通訊學(xué)術(shù)會(huì)議論文集[C];2005年

10 劉靜萍;德熙嘉措;;安多藏語(yǔ)輔音識(shí)別的設(shè)計(jì)[A];民族語(yǔ)言文字信息技術(shù)研究——第十一屆全國(guó)民族語(yǔ)言文字信息學(xué)術(shù)研討會(huì)論文集[C];2007年

相關(guān)博士學(xué)位論文前10條

1 黃麗霞;非特定人魯棒性語(yǔ)音識(shí)別中前端濾波器的研究[D];太原理工大學(xué);2011年

2 謝春輝;音頻隱藏分析方法研究[D];中國(guó)科學(xué)技術(shù)大學(xué);2011年

3 包桂蘭;基于EPG的蒙古語(yǔ)標(biāo)準(zhǔn)音協(xié)同發(fā)音研究[D];內(nèi)蒙古大學(xué);2011年

4 呂釗;噪聲環(huán)境下的語(yǔ)音識(shí)別算法研究[D];安徽大學(xué);2011年

5 姜濤;網(wǎng)絡(luò)環(huán)境下說(shuō)話人識(shí)別關(guān)鍵技術(shù)研究[D];哈爾濱工業(yè)大學(xué);2011年

6 肖文斌;基于耦合隱馬爾可夫模型的滾動(dòng)軸承故障診斷與性能退化評(píng)估研究[D];上海交通大學(xué);2011年

7 韓志艷;語(yǔ)音信號(hào)魯棒特征提取及可視化技術(shù)研究[D];東北大學(xué);2009年

8 田良臣;語(yǔ)文科口語(yǔ)課程的多維研究[D];華東師范大學(xué);2006年

9 高林;育苗生產(chǎn)線氣吸式播種系統(tǒng)智能控制的研究[D];北京林業(yè)大學(xué);2008年

10 汪云路;語(yǔ)音隱藏分析方法研究[D];中國(guó)科學(xué)技術(shù)大學(xué);2008年

相關(guān)碩士學(xué)位論文前10條

1 劉繼芳;基于計(jì)算聽(tīng)覺(jué)場(chǎng)景分析的混合語(yǔ)音分離研究[D];哈爾濱工程大學(xué);2009年

2 王文姝;基于模糊理論的關(guān)鍵詞識(shí)別算法研究[D];哈爾濱工程大學(xué);2010年

3 劉維巍;語(yǔ)音信號(hào)基音周期檢測(cè)算法研究[D];哈爾濱工程大學(xué);2010年

4 樓佳;基于網(wǎng)絡(luò)QoS的AMR語(yǔ)音編碼算法研究[D];哈爾濱工程大學(xué);2010年

5 陳晶;基于詞片網(wǎng)格的語(yǔ)音文檔主題分類[D];哈爾濱工程大學(xué);2010年

6 朱妹麗;三種篡改情況下的音頻鑒定方法研究[D];大連理工大學(xué);2010年

7 周翠梅;說(shuō)話人識(shí)別技術(shù)的研究與實(shí)現(xiàn)[D];大連理工大學(xué);2010年

8 甄會(huì);欠定盲分離混合矩陣估計(jì)方法的研究[D];大連理工大學(xué);2010年

9 張宇;基于倒譜特征的說(shuō)話人識(shí)別方法研究[D];大連海事大學(xué);2010年

10 劉亞玉;限定性文本的語(yǔ)料庫(kù)自動(dòng)構(gòu)建[D];中國(guó)海洋大學(xué);2010年

本文編號(hào)：2193850

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/wltx/2193850.html

上一篇：基于FFT和共軛梯度法的近場(chǎng)診斷方法
下一篇：應(yīng)用判決反饋的混合信號(hào)單通道盲分離算法

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

基于神經(jīng)網(wǎng)絡(luò)的多方言口音漢語(yǔ)語(yǔ)音識(shí)別系統(tǒng)研究