Research on Affective Visual Question Answering
發(fā)布時間:2021-03-23 02:58
視覺問答(Visual Question Answering,VQA)最近引起了機器學(xué)習(xí)領(lǐng)域研究人員的廣泛關(guān)注。已經(jīng)有許多研究者提出了不同的注意力模型,以解決關(guān)注圖像的局部區(qū)域的需要,但研究人員在特征提取過程中遺漏了圖像和視頻的基本情感信息,且答案中也沒有提供太多的情感,導(dǎo)致生成的答案不夠自然、真實。因此,論文旨在通過增加對問題和圖像(視頻)中情感信息的分析,生成體現(xiàn)情感的更自然的答案,填補VQA中未體現(xiàn)情感信息的不足。具體來說,本文主要關(guān)注具有單一情感的圖像、具有多種情感的圖像以及針對視頻的VQA問題。研究成果可直接應(yīng)用于教育、盲人視覺輔助、健康以及其它領(lǐng)域。主要貢獻如下:(1)提出基于注意力模型的單一情感感知圖像問答生成(Mood-Aware Image Question Answering,MAIQA)方法,該方法結(jié)合局部圖像特征、從圖像特定區(qū)域和問題中檢測到情緒信息,以產(chǎn)生包含情感信息的答案。這里的情感僅僅指出現(xiàn)在圖像中人物的情感而非其他物體的情感。具體而言,圖像、問題和情感的特征被嵌入到單個長短時記憶網(wǎng)絡(luò)(Long Short Term Memory,LSTM)中,且分別采用...
【文章來源】:江蘇大學(xué)江蘇省
【文章頁數(shù)】:128 頁
【學(xué)位級別】:博士
【文章目錄】:
Abstract
摘要
Chapter 1 Introduction
1.1 Background and motivation
1.2 Challenges
1.3 Contributions
1.4 Outline of the dissertation
Chapter 2 Review of related literature
2.1 Visual question answering
2.1.1 Image question answering
2.1.2 Video question answering
2.2 Mood detection
2.2.1 Mood detection on images
2.2.2 Mood detection on videos
2.3 Visual captioning
2.3.1 Image captioning
2.3.2 Video captioning
2.4 Multi-task learning
2.5 Feature embeddings
2.6 Visual mood attribute detection
2.7 Attention models
2.8 Traditional visual question answering
Chapter 3 Mood-aware image question answering
3.1 Introduction
3.2 The MAIQA model
3.2.1 Image, question and mood embeddings
3.2.2 Attention models for the image, question and mood
3.2.3 Feature learning and inference
3.2.4 Vocabulary
3.2.5 Feature fusion
3.2.6 Answer prediction
3.3 Experiments and results
3.3.1 The image dataset customization
3.3.2 Experiment setup
3.3.3 Qualitative analysis of sample answers
3.3.4 Comparison of our mood detector with other baseline models
3.3.5 Possible answer categories
3.3.6 Comparison of the performance of our attention models
3.3.7 Comparison of the MAIQA LSTM model with other models
3.4 Brief summary
Chapter 4 Multi-mood image question answering
4.1 Introduction
4.2 The MMIQA model
4.2.1 Image feature extraction, embedding and attention
4.2.2 Question feature embedding and attention
4.2.3 Mood feature detection, embedding and attention
4.2.4 Triple attention model
4.2.5 Answer vocabulary
4.2.6 Fusion of features
4.2.7 Answer generation
4.3 Experiments and results
4.3.1 The image dataset customization
4.3.2 Experiment setup
4.3.3 Qualitative analysis
4.3.4 Comparison of feature embedding techniques using different dataset conditions
4.3.5 Comparison of validation results of our feature embedding techniques
4.3.6 Comparison of the accuracy of different multi-mood detectors
4.3.7 Analysis of the contribution of the multi-mood detector to performance of MMIQA
4.3.8 Overall comparison of MMIQA with the baseline model
4.4 Brief summary
Chapter 5 Multi-mood video question answering
5.1 Introduction
5.2 The MMVQA model
5.2.1 Overview
5.2.2 Video QA route for the main question answering task
5.2.3 Affective route for mood detection
5.2.4 Prediction of the conventional and affective answers
5.3 Experiments and results
5.3.1 Video datasets
5.3.2 Experiment setup
5.3.3 Comparison with mood detection baseline model
5.3.4 Attention model ablation studies
5.3.5 Analysis of the accuracy of MMVQA conventional answers
5.3.6 Analysis of the accuracy of MMVQA affective answers
5.3.7 Qualitative analysis
5.4 Brief summary
Chapter 6 General conclusions and future work
6.1 General conclusions
6.2 Our work
6.3 Future work
Bibliography
Acknowledgements
Academic Publications
本文編號:3094998
【文章來源】:江蘇大學(xué)江蘇省
【文章頁數(shù)】:128 頁
【學(xué)位級別】:博士
【文章目錄】:
Abstract
摘要
Chapter 1 Introduction
1.1 Background and motivation
1.2 Challenges
1.3 Contributions
1.4 Outline of the dissertation
Chapter 2 Review of related literature
2.1 Visual question answering
2.1.1 Image question answering
2.1.2 Video question answering
2.2 Mood detection
2.2.1 Mood detection on images
2.2.2 Mood detection on videos
2.3 Visual captioning
2.3.1 Image captioning
2.3.2 Video captioning
2.4 Multi-task learning
2.5 Feature embeddings
2.6 Visual mood attribute detection
2.7 Attention models
2.8 Traditional visual question answering
Chapter 3 Mood-aware image question answering
3.1 Introduction
3.2 The MAIQA model
3.2.1 Image, question and mood embeddings
3.2.2 Attention models for the image, question and mood
3.2.3 Feature learning and inference
3.2.4 Vocabulary
3.2.5 Feature fusion
3.2.6 Answer prediction
3.3 Experiments and results
3.3.1 The image dataset customization
3.3.2 Experiment setup
3.3.3 Qualitative analysis of sample answers
3.3.4 Comparison of our mood detector with other baseline models
3.3.5 Possible answer categories
3.3.6 Comparison of the performance of our attention models
3.3.7 Comparison of the MAIQA LSTM model with other models
3.4 Brief summary
Chapter 4 Multi-mood image question answering
4.1 Introduction
4.2 The MMIQA model
4.2.1 Image feature extraction, embedding and attention
4.2.2 Question feature embedding and attention
4.2.3 Mood feature detection, embedding and attention
4.2.4 Triple attention model
4.2.5 Answer vocabulary
4.2.6 Fusion of features
4.2.7 Answer generation
4.3 Experiments and results
4.3.1 The image dataset customization
4.3.2 Experiment setup
4.3.3 Qualitative analysis
4.3.4 Comparison of feature embedding techniques using different dataset conditions
4.3.5 Comparison of validation results of our feature embedding techniques
4.3.6 Comparison of the accuracy of different multi-mood detectors
4.3.7 Analysis of the contribution of the multi-mood detector to performance of MMIQA
4.3.8 Overall comparison of MMIQA with the baseline model
4.4 Brief summary
Chapter 5 Multi-mood video question answering
5.1 Introduction
5.2 The MMVQA model
5.2.1 Overview
5.2.2 Video QA route for the main question answering task
5.2.3 Affective route for mood detection
5.2.4 Prediction of the conventional and affective answers
5.3 Experiments and results
5.3.1 Video datasets
5.3.2 Experiment setup
5.3.3 Comparison with mood detection baseline model
5.3.4 Attention model ablation studies
5.3.5 Analysis of the accuracy of MMVQA conventional answers
5.3.6 Analysis of the accuracy of MMVQA affective answers
5.3.7 Qualitative analysis
5.4 Brief summary
Chapter 6 General conclusions and future work
6.1 General conclusions
6.2 Our work
6.3 Future work
Bibliography
Acknowledgements
Academic Publications
本文編號:3094998
本文鏈接:http://sikaile.net/kejilunwen/shengwushengchang/3094998.html
最近更新
教材專著