多維語(yǔ)音信息識(shí)別技術(shù)研究

發(fā)布時(shí)間：2018-08-09 10:32

【摘要】：隨著人工智能需求的不斷增加以及機(jī)器學(xué)習(xí)技術(shù)的飛速發(fā)展,語(yǔ)音交互技術(shù)儼然已經(jīng)成為下一代智能家居等諸多應(yīng)用領(lǐng)域的發(fā)展趨勢(shì)。語(yǔ)音識(shí)別、說(shuō)話(huà)人身份識(shí)別和語(yǔ)音情感識(shí)別等識(shí)別技術(shù)受到了越來(lái)越多研究者的廣泛關(guān)注與高度重視。目前,國(guó)內(nèi)外對(duì)語(yǔ)音識(shí)別的研究都是單維信息或內(nèi)容的單獨(dú)識(shí)別。然而,在日常生活中,人們采集到的語(yǔ)音信號(hào)本質(zhì)上是混合信號(hào),主要包括三個(gè)大的方面的信息:語(yǔ)音中包含的內(nèi)容信息,語(yǔ)音中含有與說(shuō)話(huà)人特征有關(guān)的信息(如性別、年齡以及情感狀態(tài)等)和與語(yǔ)音混雜在一起的背景環(huán)境聲音信息,且我們?nèi)祟?lèi)對(duì)話(huà)時(shí)能夠同時(shí)識(shí)別出上述各種聲音信息。而各種信息的分開(kāi)識(shí)別會(huì)產(chǎn)生語(yǔ)義理解的歧義性,降低語(yǔ)音識(shí)別的魯棒性,阻礙語(yǔ)音對(duì)話(huà)系統(tǒng)的發(fā)展。若機(jī)器能夠像人一般同時(shí)識(shí)別出說(shuō)話(huà)人所講的內(nèi)容、其身份、年齡、性別、情感狀態(tài)甚至背景聲等多維信息,就能極大地提高人機(jī)對(duì)話(huà)的效率,解決單維識(shí)別系統(tǒng)存在的瓶頸問(wèn)題。因此,本團(tuán)隊(duì)提出了一種全新的多維語(yǔ)音信息同時(shí)識(shí)別的研究課題。當(dāng)然,上述三個(gè)大的方面信息涉及到的識(shí)別對(duì)象有近十種,同時(shí)識(shí)別難度很大,課題涉及的研究范圍十分寬泛,因此作為開(kāi)創(chuàng)性的嘗試,本文將先研究其中與說(shuō)話(huà)人有關(guān)的多維信息識(shí)別技術(shù)。本文的主要研究工作和創(chuàng)新點(diǎn)如下:基于現(xiàn)有的性別相關(guān)的情感識(shí)別、情感環(huán)境下的性別和身份分別識(shí)別的技術(shù)研究與發(fā)展,針對(duì)僅有單維信息識(shí)別系統(tǒng)框圖的情況,本文分析了傳統(tǒng)單維說(shuō)話(huà)人信息識(shí)別的共性和特性,重點(diǎn)研究了實(shí)現(xiàn)多維說(shuō)話(huà)人信息同時(shí)識(shí)別的兩個(gè)關(guān)鍵技術(shù):特征提取和模型訓(xùn)練。(1)分析發(fā)現(xiàn)不同的語(yǔ)音特征參數(shù)能夠代表不同的語(yǔ)音相關(guān)信息,且同樣的特征向量也能用于不同的單維語(yǔ)音識(shí)別任務(wù)中。目前,常用的聲學(xué)特征參數(shù)分別是韻律特征、聲音質(zhì)量特征和基于譜的特征。而本文涉及到與說(shuō)話(huà)人有關(guān)的三方面信息識(shí)別,因此考慮運(yùn)用上述三種聲學(xué)特征的組合特征作為本文多維說(shuō)話(huà)人信息識(shí)別的特征參數(shù)。相較于單類(lèi)別的特征,它包含更豐富的語(yǔ)音信息。本文采用兩種方法分別獲得融合特征,一種是Matlab仿真平臺(tái)提取的低維特征,另一種是OpenSMILE工具箱提取的高維特征。(2)針對(duì)多維信息識(shí)別缺乏成熟的參考文獻(xiàn)和理論知識(shí)的情況,本文首先創(chuàng)造性地構(gòu)建出基于性別相關(guān)的多維信息識(shí)別基線(xiàn)系統(tǒng),作為多維識(shí)別的參照模型。然后,通過(guò)將基線(xiàn)系統(tǒng)與傳統(tǒng)的情感、性別和身份單獨(dú)識(shí)別的系統(tǒng)進(jìn)行對(duì)比,得知多維識(shí)別系統(tǒng)的平均識(shí)別率高出11.37%,從而證明了基線(xiàn)系統(tǒng)方案的可行性和有效性,并且證明了多維信息同時(shí)識(shí)別還能帶來(lái)提高其中單維信息的識(shí)別率這一優(yōu)點(diǎn),本身也成為一種新的識(shí)別方法。(3)因?yàn)槎嗑S說(shuō)話(huà)人信息識(shí)別任務(wù)本質(zhì)上是一個(gè)多標(biāo)記學(xué)習(xí)問(wèn)題。因此考慮運(yùn)用解決現(xiàn)實(shí)世界對(duì)象具有多義性的多示例多標(biāo)記學(xué)習(xí)算法進(jìn)行多維語(yǔ)音識(shí)別技術(shù)研究。本文首次將多示例多標(biāo)記支持向量機(jī)算法用于說(shuō)話(huà)人識(shí)別領(lǐng)域,并利用不同標(biāo)記之間的相互制約關(guān)系,改進(jìn)了判決機(jī)制,實(shí)現(xiàn)基于性別的雙重判決。實(shí)驗(yàn)表明,除了性別識(shí)別,無(wú)論采用哪種特征參數(shù),基于改進(jìn)的MIMLSVM系統(tǒng)的識(shí)別率都比基線(xiàn)系統(tǒng)的識(shí)別率高。其中,采用高維特征、改進(jìn)MIMLSVM系統(tǒng)的準(zhǔn)確率比采用低維特征、基線(xiàn)系統(tǒng)的高1.97%左右。可見(jiàn),恰當(dāng)?shù)膮?shù)選擇和模型匹配能顯著提高多維系統(tǒng)的識(shí)別率。但是,隨著標(biāo)記數(shù)量的增加,系統(tǒng)的運(yùn)行時(shí)間和計(jì)算復(fù)雜度也相應(yīng)的增加。即實(shí)現(xiàn)多維說(shuō)話(huà)人信息的同時(shí)識(shí)別是以付出一定的系統(tǒng)復(fù)雜性為代價(jià)。
[Abstract]:With the increasing demand for artificial intelligence and the rapid development of machine learning technology, voice interaction technology has become the development trend of the next generation of smart home and many other applications. Speech recognition, speaker identification and voice emotion recognition have attracted more and more attention and high degree of attention. At present, the research of speech recognition at home and abroad is the single identification of single dimensional information or content. However, in daily life, the speech signals that people collect are essentially mixed signals, mainly including three large information: the content information contained in the voice, and the speech contains information related to the speaker's features (such as sex. We can identify all kinds of sound information at the same time, and we can identify all kinds of sound information at the same time in human dialogue. The separate identification of various information will produce the ambiguity of semantic understanding, reduce the robustness of speech recognition, and prevent the development of speech dialogue system. If machine can The identity, age, sex, emotional state and even background sound of the speaker can be recognized as many multidimensional information as a person at the same time, which can greatly improve the efficiency of human-computer dialogue and solve the bottleneck problem in the single dimension recognition system. Therefore, this team has proposed a new research topic for the simultaneous recognition of multidimensional speech information. Of course, there are nearly ten kinds of recognition objects involved in the three major aspects of the above information. At the same time, the recognition is very difficult and the scope of research involved is very wide. Therefore, as a pioneering attempt, this article will first study the multi-dimensional information recognition technology related to the speaker. Gender related emotion recognition, the technical research and development of gender and identity identification in the emotional environment. Aiming at the only one dimension information recognition system block diagram, this paper analyzes the common and characteristic of the traditional single speaker information recognition, and focuses on the two key technologies to realize the simultaneous recognition of multi-dimensional speaker information. Feature extraction and model training. (1) it is found that different speech feature parameters can represent different speech related information, and the same eigenvectors can also be used in different single dimensional speech recognition tasks. At present, the commonly used acoustic characteristic parameters are prosodic features, sound qualitative characteristics and spectral characteristics. The speaker related three aspects of information recognition, so consider using the combined features of the above three acoustic features as the feature parameters of the multidimensional speaker information recognition. Compared with the single category, it contains more abundant speech information. This paper uses two methods to obtain the fusion features respectively, one is extracted by the Matlab simulation platform. Low dimension features, and the other is the high dimensional feature extracted by the OpenSMILE toolbox. (2) in view of the lack of mature reference and theoretical knowledge of multidimensional information recognition, this paper first creatively constructs a gender based multidimensional information recognition baseline system, as a multidimensional reference model. Then, the baseline system and transmission are passed through the baseline system. Compared to the system identified by the system of emotion, gender and identity, the average recognition rate of the multidimensional recognition system is 11.37% higher, which proves the feasibility and effectiveness of the baseline system scheme, and proves that the multi-dimensional information recognition can also bring the advantage of improving the recognition rate of Dan Weixin interest, which itself becomes a new kind. (3) because the multi-dimensional speaker information recognition task is essentially a multi label learning problem, the multi example multi label learning algorithm is considered in the study of multidimensional speech recognition. The multi example multi label support vector machine (MSVM) is used for the first time for the first time. The experiment shows that, in addition to gender recognition, the recognition rate of the improved MIMLSVM system is higher than that of the baseline system, in addition to gender recognition, the recognition rate based on the improved system is higher than that of the baseline system. Among them, the high dimension features are used to improve the MIMLSVM system. The accuracy rate is lower than that of low dimension, and the baseline system is about 1.97% higher. It is visible that proper parameter selection and model matching can significantly improve the recognition rate of multidimensional systems. However, with the increase of the number of markers, the running time and computational complexity of the system are also increased accordingly. A certain amount of system complexity is the cost.
【學(xué)位授予單位】：南京郵電大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2017
【分類(lèi)號(hào)】：TN912.34

【相似文獻(xiàn)】

相關(guān)期刊論文前10條

1 劉文舉,孫兵,鐘秋海;基于說(shuō)話(huà)人分類(lèi)技術(shù)的分級(jí)說(shuō)話(huà)人識(shí)別研究[J];電子學(xué)報(bào);2005年07期

2 丁輝;唐振民;錢(qián)博;李燕萍;;易擴(kuò)展小樣本環(huán)境說(shuō)話(huà)人辨認(rèn)系統(tǒng)的研究[J];系統(tǒng)仿真學(xué)報(bào);2008年10期

3 劉明輝;黃中偉;熊繼平;;用于說(shuō)話(huà)人辨識(shí)的評(píng)分規(guī)整[J];計(jì)算機(jī)工程與應(yīng)用;2010年12期

4 陳雪芳;楊繼臣;;一種三層判決的說(shuō)話(huà)人索引算法[J];計(jì)算機(jī)工程;2012年02期

5 楊繼臣;何俊;李艷雄;;一種基于性別的說(shuō)話(huà)人索引算法[J];計(jì)算機(jī)工程與科學(xué);2012年06期

6 何致遠(yuǎn),胡起秀,徐光yP;兩級(jí)決策的開(kāi)集說(shuō)話(huà)人辨認(rèn)方法[J];清華大學(xué)學(xué)報(bào)(自然科學(xué)版);2003年04期

7 殷啟新,韓春光,楊鑒;基于掌上電腦錄音的說(shuō)話(huà)人辨認(rèn)[J];云南民族學(xué)院學(xué)報(bào)(自然科學(xué)版);2003年04期

8 呂聲,尹俊勛;同語(yǔ)種說(shuō)話(huà)人轉(zhuǎn)換的實(shí)現(xiàn)[J];移動(dòng)通信;2004年S3期

9 董明,劉加,劉潤(rùn)生;快速口音自適應(yīng)的動(dòng)態(tài)說(shuō)話(huà)人選擇性訓(xùn)練[J];清華大學(xué)學(xué)報(bào)(自然科學(xué)版);2005年07期

10 曹敏;王浩川;;說(shuō)話(huà)人自動(dòng)識(shí)別技術(shù)研究[J];中州大學(xué)學(xué)報(bào);2007年02期

相關(guān)會(huì)議論文前10條

1 司羅;胡起秀;金琴;;完全無(wú)監(jiān)督的雙人對(duì)話(huà)中的說(shuō)話(huà)人分隔[A];第九屆全國(guó)信號(hào)處理學(xué)術(shù)年會(huì)（CCSP-99）論文集[C];1999年

2 金乃高;侯剛;王學(xué)輝;李非墨;;基于主動(dòng)感知的音視頻聯(lián)合說(shuō)話(huà)人跟蹤方法[A];2010年通信理論與信號(hào)處理學(xué)術(shù)年會(huì)論文集[C];2010年

3 馬勇;鮑長(zhǎng)春;夏丙寅;;基于辨別性深度信念網(wǎng)絡(luò)的說(shuō)話(huà)人分割[A];第十二屆全國(guó)人機(jī)語(yǔ)音通訊學(xué)術(shù)會(huì)議（NCMMSC'2013）論文集[C];2013年

4 白俊梅;張樹(shù)武;徐波;;廣播電視中的目標(biāo)說(shuō)話(huà)人跟蹤技術(shù)[A];第八屆全國(guó)人機(jī)語(yǔ)音通訊學(xué)術(shù)會(huì)議論文集[C];2005年

5 索宏彬;劉曉星;;基于高斯混合模型的說(shuō)話(huà)人跟蹤系統(tǒng)[A];第八屆全國(guó)人機(jī)語(yǔ)音通訊學(xué)術(shù)會(huì)議論文集[C];2005年

6 羅海風(fēng);龍長(zhǎng)才;;多話(huà)者環(huán)境下說(shuō)話(huà)人辨識(shí)聽(tīng)覺(jué)線(xiàn)索研究[A];中國(guó)聲學(xué)學(xué)會(huì)2009年青年學(xué)術(shù)會(huì)議[CYCA’09]論文集[C];2009年

7 王剛;鄔曉鈞;鄭方;王琳琳;張陳昊;;基于參考說(shuō)話(huà)人模型和雙層結(jié)構(gòu)的說(shuō)話(huà)人辨認(rèn)快速算法[A];第十一屆全國(guó)人機(jī)語(yǔ)音通訊學(xué)術(shù)會(huì)議論文集（一）[C];2011年

8 李經(jīng)偉;;語(yǔ)體轉(zhuǎn)換與角色定位[A];全國(guó)語(yǔ)言與符號(hào)學(xué)研究會(huì)第五屆研討會(huì)論文摘要集[C];2002年

9 王剛;鄔曉鈞;鄭方;王琳琳;張陳昊;;基于參考說(shuō)話(huà)人模型和雙層結(jié)構(gòu)的說(shuō)話(huà)人辨認(rèn)[A];第十一屆全國(guó)人機(jī)語(yǔ)音通訊學(xué)術(shù)會(huì)議論文集（二）[C];2011年

10 何磊;方棣棠;吳文虎;;說(shuō)話(huà)人聚類(lèi)與模型自適應(yīng)結(jié)合的說(shuō)話(huà)人自適應(yīng)方法[A];第六屆全國(guó)人機(jī)語(yǔ)音通訊學(xué)術(shù)會(huì)議論文集[C];2001年

相關(guān)重要報(bào)紙文章前8條

1 ;做一名積極的傾聽(tīng)者[N];中國(guó)紡織報(bào);2003年

2 唐志強(qiáng);不聽(tīng)別人說(shuō)話(huà),也能模仿其口音[N];新華每日電訊;2010年

3 atvoc;數(shù)碼語(yǔ)音電路產(chǎn)品概述[N];電子資訊時(shí)報(bào);2008年

4 記者李山;德用雙音素改進(jìn)人工語(yǔ)音表達(dá)[N];科技日?qǐng)?bào);2012年

5 中國(guó)科學(xué)院自動(dòng)化研究所模式識(shí)別國(guó)家重點(diǎn)實(shí)驗(yàn)室于劍邋陶建華;個(gè)性化語(yǔ)音生成技術(shù)面面觀[N];計(jì)算機(jī)世界;2007年

6 江西林慧勇;語(yǔ)音合成芯片MSM6295及其應(yīng)用[N];電子報(bào);2006年

7 記者　邰舉;韓開(kāi)發(fā)出腦電波情感識(shí)別技術(shù)[N];科技日?qǐng)?bào);2007年

8 黃力行邋陶建華;多模態(tài)情感識(shí)別參透人心[N];計(jì)算機(jī)世界;2007年

相關(guān)博士學(xué)位論文前10條

1 李洪儒;語(yǔ)句中的說(shuō)話(huà)人形象[D];黑龍江大學(xué);2003年

2 李威;多人會(huì)話(huà)語(yǔ)音中的說(shuō)話(huà)人角色分析[D];華南理工大學(xué);2015年

3 楊繼臣;說(shuō)話(huà)人信息分析及其在多媒體檢索中的應(yīng)用研究[D];華南理工大學(xué);2010年

4 鄭建煒;基于核方法的說(shuō)話(huà)人辨認(rèn)模型研究[D];浙江工業(yè)大學(xué);2010年

5 呂聲;說(shuō)話(huà)人轉(zhuǎn)換方法的研究[D];華南理工大學(xué);2004年

6 陳凌輝;說(shuō)話(huà)人轉(zhuǎn)換建模方法研究[D];中國(guó)科學(xué)技術(shù)大學(xué);2013年

7 玄成君;基于語(yǔ)音頻率特性抑制音素影響的說(shuō)話(huà)人特征提取[D];天津大學(xué);2014年

8 李燕萍;說(shuō)話(huà)人辨認(rèn)中的特征參數(shù)提取和魯棒性技術(shù)研究[D];南京理工大學(xué);2009年

9 徐利敏;說(shuō)話(huà)人辨認(rèn)中的特征變換和魯棒性技術(shù)研究[D];南京理工大學(xué);2008年

10 王堅(jiān);語(yǔ)音識(shí)別中的說(shuō)話(huà)人自適應(yīng)研究[D];北京郵電大學(xué);2007年

相關(guān)碩士學(xué)位論文前10條

1 李?yuàn)?多維語(yǔ)音信息識(shí)別技術(shù)研究[D];南京郵電大學(xué);2017年

2 朱麗萍;說(shuō)話(huà)人聚類(lèi)機(jī)制的設(shè)計(jì)與實(shí)施[D];北京郵電大學(xué);2016年

3 楊浩;基于廣義音素的文本無(wú)關(guān)說(shuō)話(huà)人認(rèn)證的研究[D];北京郵電大學(xué);2008年

4 史夢(mèng)潔;構(gòu)式“沒(méi)有比X更Y的（了）”研究[D];上海師范大學(xué);2015年

5 魏君;“說(shuō)你什么好”的多角度研究[D];河北大學(xué);2015年

6 解冬悅;互動(dòng)韻律：英語(yǔ)多人沖突性話(huà)語(yǔ)中說(shuō)話(huà)人的首音模式研究[D];大連外國(guó)語(yǔ)大學(xué);2015年

7 朱韋巍;揚(yáng)州街上話(huà)語(yǔ)氣詞研究[D];南京林業(yè)大學(xué);2015年

8 蔣博;特定目標(biāo)說(shuō)話(huà)人的語(yǔ)音轉(zhuǎn)換系統(tǒng)設(shè)計(jì)[D];電子科技大學(xué);2015年

9 王雅丹;漢語(yǔ)反語(yǔ)研究[D];南昌大學(xué);2015年

10 陳雨鶯;基于EMD的說(shuō)話(huà)人特征參數(shù)提取方法研究[D];湘潭大學(xué);2015年

，

本文編號(hào)：2173801

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/xinxigongchenglunwen/2173801.html

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

多維語(yǔ)音信息識(shí)別技術(shù)研究