面向語(yǔ)音情感識(shí)別的IMFE特征提取算法和融合KELM識(shí)別算法研究

發(fā)布時(shí)間：2018-01-08 07:28

本文關(guān)鍵詞：面向語(yǔ)音情感識(shí)別的IMFE特征提取算法和融合KELM識(shí)別算法研究　出處：《太原理工大學(xué)》2017年碩士論文　論文類(lèi)型：學(xué)位論文

【摘要】：語(yǔ)音作為一種包含說(shuō)話內(nèi)容和情感狀態(tài)的復(fù)雜信號(hào),是人類(lèi)進(jìn)行交流和表達(dá)情感的有效形式。語(yǔ)音情感識(shí)別是計(jì)算機(jī)通過(guò)提取并分析情感語(yǔ)音的特征參數(shù)從而判別情感類(lèi)別的一種信息處理技術(shù),對(duì)于提高人機(jī)交互智能化程度具有重要意義。本文在語(yǔ)音情感識(shí)別的課題背景下,介紹了常用的語(yǔ)音庫(kù)、情感特征和識(shí)別網(wǎng)絡(luò),將集合經(jīng)驗(yàn)?zāi)B(tài)分解(Ensemble Empirical Mode Decomposition,EEMD)算法應(yīng)用到語(yǔ)音情感特征提取中,提取了本征模態(tài)函數(shù)能量特征IMFE和邊際譜幅值特征MSA,選擇了IMFE、韻律特征、MFCC三種情感特征進(jìn)行特征級(jí)融合,并提出了一種自適應(yīng)融合核函數(shù)極限學(xué)習(xí)機(jī)(Extreme Learning Machine with Kernel,KELM)的決策級(jí)融合方法用于語(yǔ)音情感識(shí)別。本文所做的主要工作如下:(1)選擇EEMD算法以非線性非平穩(wěn)信號(hào)的處理方法提取情感語(yǔ)音特征。傳統(tǒng)的情感特征提取方法均假定語(yǔ)音是短時(shí)平穩(wěn)信號(hào),針對(duì)傳統(tǒng)方法的局限性,本文在EEMD算法分解語(yǔ)音信號(hào)的基礎(chǔ)上提取了邊際譜幅值特征MSA,并選擇KELM為識(shí)別網(wǎng)絡(luò),基于柏林語(yǔ)音庫(kù)設(shè)計(jì)仿真實(shí)驗(yàn)并對(duì)四種情感(高興、悲傷、憤怒、中性)進(jìn)行識(shí)別,通過(guò)與韻律特征、MFCC特征的識(shí)別結(jié)果對(duì)比,驗(yàn)證了MSA特征的有效性。(2)提出了一種基于EEMD算法的特征提取方法并應(yīng)用于語(yǔ)音情感識(shí)別中。語(yǔ)音情感信號(hào)經(jīng)EEMD算法分解為一組本征模態(tài)函數(shù)(IMF),通過(guò)Spearman Rank相關(guān)系數(shù)篩選出有效的IMF分量,并通過(guò)能量計(jì)算得到一個(gè)語(yǔ)音情感新特征IMFE,選擇柏林語(yǔ)音庫(kù)進(jìn)行識(shí)別,并與韻律特征、MFCC特征的識(shí)別性能對(duì)比,結(jié)果表明IMFE可以有效識(shí)別情感,且對(duì)負(fù)性情感的識(shí)別效果最優(yōu)。(3)將特征級(jí)數(shù)據(jù)融合應(yīng)用于語(yǔ)音情感識(shí)別。針對(duì)單一語(yǔ)音情感特征識(shí)別效果不好的問(wèn)題,本文選擇了IMFE特征、韻律特征、MFCC特征進(jìn)行融合,設(shè)計(jì)實(shí)驗(yàn)將這三種特征的不同組合分別輸入到分類(lèi)器中,在柏林語(yǔ)音庫(kù)仿真并與輸入的單一特征識(shí)別結(jié)果對(duì)比,結(jié)果表明特征融合在一定程度上提高了識(shí)別性能,證明了三種特征具有互補(bǔ)性,但也因?yàn)樘卣骶S數(shù)的簡(jiǎn)單相加造成了特征融合在部分情感的識(shí)別率低于單一特征識(shí)別率的問(wèn)題。(4)提出了一種基于融合KELM的語(yǔ)音情感識(shí)別新方法。針對(duì)單一特征、單分類(lèi)器識(shí)別性能不佳的問(wèn)題,本文將決策級(jí)數(shù)據(jù)融合應(yīng)用于語(yǔ)音情感識(shí)別的研究中,首先提取三種語(yǔ)音情感特征,并分別訓(xùn)練對(duì)應(yīng)的單分類(lèi)器,同時(shí)把單分類(lèi)器的數(shù)值輸出統(tǒng)一轉(zhuǎn)化成概率輸出;然后通過(guò)制定的決策策略得到測(cè)試集的自適應(yīng)權(quán)值,決策策略依據(jù)概率矩陣而定;最后對(duì)各單分類(lèi)器的輸出概率線性加權(quán)并判別輸出。選擇柏林語(yǔ)音庫(kù)進(jìn)行識(shí)別,結(jié)果表明融合KELM在單一情感和整體的識(shí)別率均達(dá)到最優(yōu),優(yōu)于單一特征、特征融合和常用決策策略的性能,是一種有效的語(yǔ)音情感識(shí)別方法。
[Abstract]:Speech is a kind of complex signal which includes speech content and emotional state. Speech emotion recognition is a kind of information processing technology in which the computer extracts and analyzes the characteristic parameters of emotion speech to distinguish the emotion category. It is of great significance to improve the intelligence of human-computer interaction. Under the background of speech emotion recognition, this paper introduces the commonly used speech database, emotional characteristics and recognition network. The EMD (Ensemble Empirical Mode DecompositionEEMD) algorithm is applied to the speech emotion feature extraction. The energy feature of intrinsic mode function (IMFE) and the marginal spectrum feature (MSA) were extracted, and the three affective features of IMFEand prosodic feature were selected for feature level fusion. An adaptive fusion kernel function extreme learning machine (extreme Learning Machine with Kernel) is proposed. KELM) decision level fusion method for speech emotion recognition. The main work of this paper is as follows: 1). EEMD algorithm is chosen to extract emotional speech features by nonlinear non-stationary signal processing, and the traditional emotional feature extraction methods assume that the speech is a short-time stationary signal. Aiming at the limitation of the traditional method, this paper extracts the marginal spectral amplitude feature based on the EEMD algorithm, and selects KELM as the recognition network. Based on the Berlin language corpus, a simulation experiment was designed and four emotions (happiness, sadness, anger, neutral) were recognized, and the results were compared with those of the prosodic feature MFCC. Verify the validity of MSA feature. A feature extraction method based on EEMD algorithm is proposed and applied to speech emotion recognition. The speech emotion signal is decomposed into a set of intrinsic mode functions by EEMD algorithm. The effective IMF component is selected by Spearman Rank correlation coefficient, and a new feature of speech emotion is obtained by energy calculation, and the Berlin phonetic corpus is selected for recognition. Compared with the prosodic feature of MFCC, the result shows that IMFE can recognize emotion effectively. The feature level data fusion is applied to speech emotion recognition. Aiming at the problem that the recognition effect of single speech emotion feature is not good, this paper chooses IMFE feature. The prosodic features are fused with MFCC features, and the different combinations of these three features are input into the classifier respectively. The results are simulated in the Berlin speech corpus and compared with the single feature recognition results. The results show that the feature fusion improves the recognition performance to some extent and proves that the three features are complementary. However, because of the simple addition of feature dimension, the recognition rate of feature fusion in some emotions is lower than that of single feature recognition rate. A new speech emotion recognition method based on fusion KELM is proposed, which aims at a single feature. In this paper, the decision level data fusion is applied to the research of speech emotion recognition. Firstly, three kinds of speech emotion features are extracted and the corresponding single classifiers are trained. At the same time, the numerical output of the single classifier is transformed into probabilistic output. Then the adaptive weight of the test set is obtained by the decision strategy, and the decision strategy is based on the probability matrix. Finally, the output probability of each single classifier is linearly weighted and the output is judged. The results show that the recognition rate of the fusion KELM is optimal both in single emotion and in the whole, which is superior to the single feature. Feature fusion and the performance of common decision strategies is an effective method for speech emotion recognition.
【學(xué)位授予單位】：太原理工大學(xué)
【學(xué)位級(jí)別】：碩士
【學(xué)位授予年份】：2017
【分類(lèi)號(hào)】：TN912.34

【相似文獻(xiàn)】

相關(guān)期刊論文前10條

1 林奕琳;韋崗;楊康才;;語(yǔ)音情感識(shí)別的研究進(jìn)展[J];電路與系統(tǒng)學(xué)報(bào);2007年01期

2 趙力;黃程韋;;實(shí)用語(yǔ)音情感識(shí)別中的若干關(guān)鍵技術(shù)[J];數(shù)據(jù)采集與處理;2014年02期

3 陳建廈,李翠華;語(yǔ)音情感識(shí)別的研究進(jìn)展[J];計(jì)算機(jī)工程;2005年13期

4 王茜;;一個(gè)語(yǔ)音情感識(shí)別系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[J];大眾科技;2006年08期

5 孫亞;;遠(yuǎn)程教學(xué)中語(yǔ)音情感識(shí)別系統(tǒng)的研究與實(shí)現(xiàn)[J];長(zhǎng)春理工大學(xué)學(xué)報(bào)(高教版);2008年02期

6 章國(guó)寶;宋清華;費(fèi)樹(shù)岷;趙艷;;語(yǔ)音情感識(shí)別研究[J];計(jì)算機(jī)技術(shù)與發(fā)展;2009年01期

7 石瑛;胡學(xué)鋼;方磊;;基于決策樹(shù)的多特征語(yǔ)音情感識(shí)別[J];計(jì)算機(jī)技術(shù)與發(fā)展;2009年01期

8 趙臘生;張強(qiáng);魏小鵬;;語(yǔ)音情感識(shí)別研究進(jìn)展[J];計(jì)算機(jī)應(yīng)用研究;2009年02期

9 張石清;趙知?jiǎng)?;噪聲背景下的語(yǔ)音情感識(shí)別[J];西南交通大學(xué)學(xué)報(bào);2009年03期

10 黃程韋;金峗;王青云;趙艷;趙力;;基于特征空間分解與融合的語(yǔ)音情感識(shí)別[J];信號(hào)處理;2010年06期

相關(guān)會(huì)議論文前8條

1 陳建廈;;語(yǔ)音情感識(shí)別綜述[A];第一屆中國(guó)情感計(jì)算及智能交互學(xué)術(shù)會(huì)議論文集[C];2003年

2 楊桃香;楊鑒;畢福昆;;基于模糊聚類(lèi)的語(yǔ)音情感識(shí)別[A];第三屆和諧人機(jī)環(huán)境聯(lián)合學(xué)術(shù)會(huì)議（HHME2007）論文集[C];2007年

3 羅武駿;包永強(qiáng);趙力;;基于模糊支持向量機(jī)的語(yǔ)音情感識(shí)別方法[A];2012'中國(guó)西部聲學(xué)學(xué)術(shù)交流會(huì)論文集(Ⅱ)[C];2012年

4 王青;謝波;陳根才;;基于神經(jīng)網(wǎng)絡(luò)的漢語(yǔ)語(yǔ)音情感識(shí)別[A];第一屆中國(guó)情感計(jì)算及智能交互學(xué)術(shù)會(huì)議論文集[C];2003年

5 張鼎天;徐明星;;基于調(diào)制頻譜特征的自動(dòng)語(yǔ)音情感識(shí)別[A];第十二屆全國(guó)人機(jī)語(yǔ)音通訊學(xué)術(shù)會(huì)議（NCMMSC'2013）論文集[C];2013年

6 童燦;;基于boosting HMM的語(yǔ)音情感識(shí)別[A];2008年中國(guó)高校通信類(lèi)院系學(xué)術(shù)研討會(huì)論文集（下冊(cè)）[C];2009年

7 戴明洋;楊大利;徐明星;;語(yǔ)音情感識(shí)別中UBM訓(xùn)練集的組成研究[A];第十一屆全國(guó)人機(jī)語(yǔ)音通訊學(xué)術(shù)會(huì)議論文集（一）[C];2011年

8 張衛(wèi);張雪英;孫穎;;基于HHT邊際Teager能量譜的語(yǔ)音情感識(shí)別[A];第十二屆全國(guó)人機(jī)語(yǔ)音通訊學(xué)術(shù)會(huì)議（NCMMSC'2013）論文集[C];2013年

相關(guān)博士學(xué)位論文前7條

1 孫亞新;語(yǔ)音情感識(shí)別中的特征提取與識(shí)別算法研究[D];華南理工大學(xué);2015年

2 王坤俠;語(yǔ)音情感識(shí)別方法研究[D];合肥工業(yè)大學(xué);2015年

3 韓文靜;語(yǔ)音情感識(shí)別關(guān)鍵技術(shù)研究[D];哈爾濱工業(yè)大學(xué);2013年

4 謝波;普通話語(yǔ)音情感識(shí)別關(guān)鍵技術(shù)研究[D];浙江大學(xué);2006年

5 尤鳴宇;語(yǔ)音情感識(shí)別的關(guān)鍵技術(shù)研究[D];浙江大學(xué);2007年

6 劉佳;語(yǔ)音情感識(shí)別的研究與應(yīng)用[D];浙江大學(xué);2009年

7 趙臘生;語(yǔ)音情感特征提取與識(shí)別方法研究[D];大連理工大學(xué);2010年

相關(guān)碩士學(xué)位論文前10條

1 陳曉東;基于卷積神經(jīng)網(wǎng)絡(luò)的語(yǔ)音情感識(shí)別[D];華南理工大學(xué);2015年

2 孫志鋒;語(yǔ)音情感識(shí)別研究[D];陜西師范大學(xué);2015年

3 譚發(fā)曾;語(yǔ)音情感狀態(tài)模糊識(shí)別研究[D];電子科技大學(xué);2015年

4 陳鑫;相空間重構(gòu)在語(yǔ)音情感識(shí)別中的研究[D];長(zhǎng)沙理工大學(xué);2014年

5 李昌群;基于特征選擇的語(yǔ)音情感識(shí)別[D];合肥工業(yè)大學(xué);2015年

6 陳文汐;基于核函數(shù)的語(yǔ)音情感識(shí)別技術(shù)的研究[D];東南大學(xué);2015年

7 薛文韜;基于深度學(xué)習(xí)和遷移學(xué)習(xí)的語(yǔ)音情感識(shí)別方法研究[D];江蘇大學(xué);2016年

8 宋明虎;電力行業(yè)電話電話客服語(yǔ)音情感識(shí)別[D];昆明理工大學(xué);2016年

9 陳肖;基于多粒度特征融合的維度語(yǔ)音情感識(shí)別方法研究[D];哈爾濱工業(yè)大學(xué);2016年

10 任浩;基于多級(jí)分類(lèi)的語(yǔ)音情感識(shí)別[D];哈爾濱工業(yè)大學(xué);2016年

，

本文編號(hào)：1396176

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會(huì)員下載

Download by Member

本文鏈接：http://sikaile.net/kejilunwen/xinxigongchenglunwen/1396176.html

上一篇：智能電視交互界面易用性探究
下一篇：自回歸模型驅(qū)動(dòng)的語(yǔ)音增強(qiáng)算法研究

論文發(fā)表

·知網(wǎng)|萬(wàn)方|維普|龍?jiān)磡省級(jí)|國(guó)家級(jí)|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

天堂国产午夜亚洲专区-少妇人妻综合久久蜜臀-国产成人户外露出视频在线-国产91传媒一区二区三区

面向語(yǔ)音情感識(shí)別的IMFE特征提取算法和融合KELM識(shí)別算法研究