基于深度學(xué)習(xí)的情緒感知系統(tǒng)的研究與設(shè)計(jì)
發(fā)布時(shí)間:2018-03-30 14:45
本文選題:深度學(xué)習(xí) 切入點(diǎn):情緒感知 出處:《電子科技大學(xué)》2017年碩士論文
【摘要】:情緒感知就是對人的情緒進(jìn)行識(shí)別,是人工智能研究的重要方面。為了提高人機(jī)交互體驗(yàn),讓機(jī)器更好地理解人的情感,學(xué)術(shù)界從人的聲音,表情,動(dòng)作等方面展開了研究,其中從語音角度進(jìn)行的情緒感知是本文的主要內(nèi)容。深度學(xué)習(xí)是人工智能領(lǐng)域當(dāng)前最熱的領(lǐng)域,在語音識(shí)別,圖像識(shí)別,自然語言處理方面都取得了顯著的成果。而深度學(xué)習(xí)領(lǐng)域的飛速發(fā)展,也產(chǎn)生了一些比較有效的模型方法,如深度信念網(wǎng)絡(luò)DBN,卷積神經(jīng)網(wǎng)絡(luò)CNN,循環(huán)神經(jīng)網(wǎng)絡(luò)RNN等等,如何利用深度學(xué)習(xí)方法在語音情緒感知方面提高情緒感知的準(zhǔn)確率是一個(gè)新的研究問題。本文正是針對上述問題,以如何應(yīng)用深度學(xué)習(xí)方法提高情緒感知準(zhǔn)確率為研究對象,在對傳統(tǒng)語音情緒感知的研究理論進(jìn)行歸納總結(jié)的基礎(chǔ)上,同時(shí)對深度學(xué)習(xí)領(lǐng)域的各種模型方法進(jìn)行詳盡的理論分析,使用tensorflow平臺(tái)建立深度學(xué)習(xí)模型并且設(shè)計(jì)基于C/S的iOS移動(dòng)端的語音情緒感知系統(tǒng)。主要工作如下:1.本文研究了情緒感知的傳統(tǒng)方法,分析了傳統(tǒng)情緒識(shí)別方法優(yōu)缺點(diǎn)。傳統(tǒng)情緒感知傳統(tǒng)方法主要是使用手工特征提取,人工種類很多,最常用的是MFCC梅爾倒譜系數(shù),但從語音識(shí)別領(lǐng)域近年來的成果來看,效果不如將音頻轉(zhuǎn)化為語譜圖傳入神經(jīng)網(wǎng)絡(luò)進(jìn)行自動(dòng)特征學(xué)習(xí)得到的訓(xùn)練結(jié)果好,本文在語音情緒感知中也引入了將語音轉(zhuǎn)為語譜圖輸入,進(jìn)行自動(dòng)特征學(xué)習(xí)的方式。2.本文研究分析了深度學(xué)習(xí)的主流模型,分析了當(dāng)前已有文著采用的深度學(xué)習(xí)方法,進(jìn)一步提出XNN-SVM模型在語音情緒感知領(lǐng)域進(jìn)行應(yīng)用。筆者基于Tensorflow平臺(tái)使用XNN-SVM模型建立了系統(tǒng)原型,并在此系統(tǒng)原型上進(jìn)行若干對比實(shí)驗(yàn),證明了模型的改進(jìn)效果。3.本文設(shè)計(jì)實(shí)現(xiàn)一個(gè)基于C/S模式的雙端識(shí)別語音情緒感知系統(tǒng),既可以通過手機(jī)進(jìn)行本地識(shí)別,同時(shí)可以通過服務(wù)器進(jìn)行識(shí)別反饋,幫助改進(jìn)模型。并且采集了300條語音情感數(shù)據(jù)進(jìn)行系統(tǒng)測試,驗(yàn)證了該模型的工程實(shí)用性。
[Abstract]:Emotion perception is the recognition of human emotion, which is an important aspect of artificial intelligence research. In order to improve the human-computer interaction experience and make the machine understand human emotion better, the academic circles have carried out research from the aspects of human voice, expression, action and so on. The emotion perception from the perspective of speech is the main content of this paper. Deep learning is the hottest field in the field of artificial intelligence, in speech recognition, image recognition, The rapid development of deep learning has produced some effective modeling methods, such as deep belief network (DBN), convolutional neural network (CNN), cyclic neural network (RNN), and so on. It is a new research problem how to improve the accuracy of emotion perception in phonetic emotion perception by using the method of deep learning. In this paper, we focus on how to use the method of deep learning to improve the accuracy of emotion perception. On the basis of summing up the traditional theories of phonological emotion perception, and at the same time making a detailed theoretical analysis of various model methods in the field of in-depth learning. Using tensorflow platform to set up the model of deep learning and to design the voice emotion perception system of iOS mobile side based on C / S. The main work is as follows: 1.This paper studies the traditional methods of emotion perception. The advantages and disadvantages of traditional emotion recognition methods are analyzed. Traditional emotion perception methods mainly use manual feature extraction, there are many artificial types, the most commonly used is MFCC Mel cepstrum, but from the recent achievements in the field of speech recognition, The effect is not as good as the result of automatic feature learning by converting audio frequency into speech spectrum afferent neural network. In this paper, we also introduce speech into speech spectrum input in speech emotion perception. 2. This paper studies and analyzes the mainstream model of deep learning, and analyzes the methods of depth learning that have been used in current works. Furthermore, the application of XNN-SVM model in the field of phonological emotion perception is proposed. Based on the Tensorflow platform, the author uses XNN-SVM model to build the prototype of the system, and carries out some comparative experiments on the prototype of the system. 3. This paper designs and implements a two-terminal speech emotion sensing system based on C / S mode, which can be recognized locally by mobile phone, and can be recognized and feedback by the server at the same time. The model is improved and 300 speech emotion data are collected for system test, which verifies the engineering practicability of the model.
【學(xué)位授予單位】:電子科技大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP18
【參考文獻(xiàn)】
相關(guān)期刊論文 前5條
1 劉雨青;劉艷芳;;基于時(shí)空域轉(zhuǎn)換的音頻信號(hào)分析與識(shí)別[J];數(shù)碼設(shè)計(jì);2016年02期
2 邵兵;杜鵬飛;;基于卷積神經(jīng)網(wǎng)絡(luò)的語音情感識(shí)別方法[J];科技創(chuàng)新導(dǎo)報(bào);2016年06期
3 韓文靜;李海峰;阮華斌;馬琳;;語音情感識(shí)別研究進(jìn)展綜述[J];軟件學(xué)報(bào);2014年01期
4 董建彬;馬艷玲;;Mel頻率倒譜系數(shù)的提取與改進(jìn)[J];科技信息(科學(xué)教研);2008年15期
5 趙蕤,王作英;用于語音識(shí)別的基于頻譜調(diào)整的信道自適應(yīng)方法[J];清華大學(xué)學(xué)報(bào)(自然科學(xué)版);2005年04期
相關(guān)碩士學(xué)位論文 前1條
1 丁倩;基于語音信息的多特征情緒識(shí)別算法研究[D];山東大學(xué);2015年
,本文編號(hào):1686373
本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/1686373.html
最近更新
教材專著