基于深度學(xué)習(xí)的人臉面部情感識別的研究
發(fā)布時(shí)間:2018-03-15 23:25
本文選題:表情識別 切入點(diǎn):特征 出處:《哈爾濱工業(yè)大學(xué)》2017年碩士論文 論文類型:學(xué)位論文
【摘要】:隨著人工智能的發(fā)展,情緒識別的應(yīng)用場景越來越廣泛,典型的有廣告效果評估、產(chǎn)品評測、視頻分析、醫(yī)療康復(fù)、安全駕駛以及情感機(jī)器人等。目前,情緒識別在人機(jī)交互領(lǐng)域發(fā)展特別快,尤其是在安全駕駛、情感機(jī)器人應(yīng)用上,讓機(jī)器更好的理解人、更加智能和人性化的為人類服務(wù)是近期人工智能革命的根本。機(jī)器逐漸學(xué)習(xí)到足夠的情感認(rèn)知能力以后,就可以在人機(jī)交互中對用戶體驗(yàn)進(jìn)行一系列升級,最終,使機(jī)器能像普通人一樣融入人類生活。情緒識別廣義上可以通過表情、語音語調(diào)或者腦電捕捉等進(jìn)行。目前技術(shù)上最成熟、得到廣泛應(yīng)用的是表情識別技術(shù),也就是基于計(jì)算機(jī)視覺算法,識別人臉的表情動(dòng)作和推斷喜怒哀樂等基本情緒。因?yàn)椴煌吮磉_(dá)感情程度存在偏差,自動(dòng)面部表情識別(Facial Expression Recognition,FER)在計(jì)算機(jī)視覺中仍然是一個(gè)具有挑戰(zhàn)性和有趣的問題。盡管在開發(fā)用于FER的各種方法方面做出了努力,但是當(dāng)處理未標(biāo)注的或在自然環(huán)境中捕獲的那些圖片時(shí),現(xiàn)有的方法缺乏普適性。大多數(shù)現(xiàn)有方法基于手工特征(例如梯度直方圖,局部二值模式和Gabor特征描述算子),然后結(jié)合分類器(如支持向量機(jī)),其中分類器的超參數(shù)被優(yōu)化以在單個(gè)數(shù)據(jù)庫或類似數(shù)據(jù)庫的小集合中給出最佳識別精度。不同特征描述算子對不同背景下的表情圖像的表征能力存在偏差,必須針對特定背景圖像找到最合適的特征描述算子,這大大增加了工作復(fù)雜度。而深度學(xué)習(xí)可以自動(dòng)學(xué)習(xí)面部特征,并且屬于端到端模型,即特征學(xué)習(xí)和分類在一個(gè)模型下完成。本文基于谷歌提出的inception結(jié)構(gòu)提出了一個(gè)深層神經(jīng)網(wǎng)絡(luò)架構(gòu),以解決在不同背景圖像需要尋找不同特征描述算子的問題,并精簡模型使之能夠成功應(yīng)用到移動(dòng)端。具體來說,我們的網(wǎng)絡(luò)由兩個(gè)卷積層組成,每個(gè)層之后是最大池,緊接著是三個(gè)inception層。網(wǎng)絡(luò)是單個(gè)組件架構(gòu),其將注冊的面部圖像作為輸入并將其分類為六個(gè)基本表情或中性表情中的任一個(gè)。本文對七個(gè)公開可用的面部表情數(shù)據(jù)庫(Multi PIE、MMI、CK+、DISFA、FERA、SFEW和FER2013)進(jìn)行了全面的實(shí)驗(yàn)。主要對比分析了基于傳統(tǒng)特征的學(xué)習(xí)方法和基于深度學(xué)習(xí)方法在不同數(shù)據(jù)庫上的泛化能力,實(shí)驗(yàn)表明基于深度學(xué)習(xí)的方法泛化能力要好于基于傳統(tǒng)特征的學(xué)習(xí)方法;此外,還與目前主流的模型諸如VGG、Google Net、Res Net等模型在表情識別任務(wù)上做了對比進(jìn)一步說明了基于inception的結(jié)構(gòu)在保證表情識別準(zhǔn)確率的前提下,可以盡量精簡模型大小。
[Abstract]:With the development of artificial intelligence, the application of emotion recognition is becoming more and more extensive, typical of which are advertising effect evaluation, product evaluation, video analysis, medical rehabilitation, safe driving and emotional robot. Emotional recognition is developing very quickly in the field of human-computer interaction, especially in the field of safe driving and affective robot applications, so that machines can understand people better. A more intelligent and humane service to humanity is fundamental to the recent revolution in artificial intelligence. After learning enough emotional cognitive abilities, machines can upgrade the user experience in human-computer interaction, eventually. So that machines can be integrated into human life like ordinary people. In a broad sense, emotion recognition can be done through facial expression, voice and intonation, or EEG capture, etc. At present, it is the most mature technology and is widely used in facial expression recognition. That is, based on computer vision algorithms that recognize facial movements and infer basic emotions, such as emotions, emotions, emotions, etc., because there is a bias in how different people express their feelings. Automatic facial expression recognition facial Expression recognition fer remains a challenging and interesting problem in computer vision, despite efforts to develop methods for FER, However, when dealing with unlabeled or captured images in the natural environment, the existing methods lack universality. Most of the existing methods are based on manual features (such as gradient histograms, for example, gradient histograms). The local binary pattern and the Gabor feature description operator are then combined with classifiers (such as support vector machines), where the classifier's superparameters are optimized to give the best recognition accuracy in a single database or a small set of similar databases. The same feature description operator deviates from the representation ability of facial expression images in different backgrounds. The most suitable feature description operator must be found for a particular background image, which greatly increases the complexity of the work. Depth learning can automatically learn facial features and belong to the end-to-end model. In this paper, based on the inception structure proposed by Google, a deep neural network architecture is proposed to solve the problem of finding different feature description operators in different background images. Specifically, our network consists of two convolution layers, each followed by a maximum pool, followed by three inception layers. The network is a single component architecture. It uses registered facial images as input and classifies them into any of the six basic or neutral expressions. Seven publicly available facial expression databases, Multi Piek / MMICK / DISFAA SFEW and FER2013, have been tested in this paper. Compared with the traditional feature based learning method and the depth based learning method's generalization ability on different databases, The experimental results show that the generalization ability of the method based on depth learning is better than that of the method based on traditional features. It is also compared with the current mainstream models such as VGGG, Net, and so on, on the task of facial expression recognition. It is further proved that the structure based on inception can reduce the size of the model under the premise of ensuring the accuracy of facial expression recognition.
【學(xué)位授予單位】:哈爾濱工業(yè)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP391.41;TP181
【參考文獻(xiàn)】
相關(guān)期刊論文 前3條
1 孫曉;潘汀;任福繼;;基于ROI-KNN卷積神經(jīng)網(wǎng)絡(luò)的面部表情識別[J];自動(dòng)化學(xué)報(bào);2016年06期
2 盧官明;何嘉利;閆靜杰;李海波;;一種用于人臉表情識別的卷積神經(jīng)網(wǎng)絡(luò)[J];南京郵電大學(xué)學(xué)報(bào)(自然科學(xué)版);2016年01期
3 王燕;張殷綺;;基于Gabor和二值疊加CS-LBP特征的人臉表情識別[J];計(jì)算機(jī)工程與應(yīng)用;2015年19期
,本文編號:1617320
本文鏈接:http://sikaile.net/kejilunwen/zidonghuakongzhilunwen/1617320.html
最近更新
教材專著