The University of Southampton
University of Southampton Institutional Repository

Recognizing Emotions in Video Using Multimodal DNN Feature Fusion

Recognizing Emotions in Video Using Multimodal DNN Feature Fusion
Recognizing Emotions in Video Using Multimodal DNN Feature Fusion
We present our system description of input-level
multimodal fusion of audio, video, and text for
recognition of emotions and their intensities for
the 2018 First Grand Challenge on Computational
Modeling of Human Multimodal Language. Our
proposed approach is based on input-level feature
fusion with sequence learning from Bidirectional
Long-Short Term Memory (BLSTM) deep neural
networks (DNNs). We show that our fusion approach outperforms unimodal predictors. Our system performs 6-way simultaneous classification
and regression, allowing for overlapping emotion
labels in a video segment. This leads to an overall binary accuracy of 90%, overall 4-class accuracy of 89.2% and an overall mean-absolute-error
(MAE) of 0.12. Our work shows that an early fusion technique can effectively predict the presence
of multi-label emotions as well as their coarse grained intensities. The presented multimodal approach creates a simple and robust baseline on this
new Grand Challenge dataset. Furthermore, we
provide a detailed analysis of emotion intensity
distributions as output from our DNN, as well as
a related discussion concerning the inherent difficulty of this task.
11-19
Association for Computational Linguistics (ACL)
Williams, Jennifer
3a1568b4-8a0b-41d2-8635-14fe69fbb360
Kleinegesse, Steven
896502e0-36a3-46c6-a1bf-91991bc278b4
Comanescu, Ramona
74f57d32-f69c-4e0f-85f1-9295bca2317c
Radu, Oana
139a656e-626a-417b-bed6-3abda87e9955
Williams, Jennifer
3a1568b4-8a0b-41d2-8635-14fe69fbb360
Kleinegesse, Steven
896502e0-36a3-46c6-a1bf-91991bc278b4
Comanescu, Ramona
74f57d32-f69c-4e0f-85f1-9295bca2317c
Radu, Oana
139a656e-626a-417b-bed6-3abda87e9955

Williams, Jennifer, Kleinegesse, Steven, Comanescu, Ramona and Radu, Oana (2018) Recognizing Emotions in Video Using Multimodal DNN Feature Fusion. In ACL 2018 Grand Challenge and Workshop on Human Multimodal Language (Challenge-HML). Association for Computational Linguistics (ACL). pp. 11-19 . (doi:10.18653/v1/W18-3302).

Record type: Conference or Workshop Item (Paper)

Abstract

We present our system description of input-level
multimodal fusion of audio, video, and text for
recognition of emotions and their intensities for
the 2018 First Grand Challenge on Computational
Modeling of Human Multimodal Language. Our
proposed approach is based on input-level feature
fusion with sequence learning from Bidirectional
Long-Short Term Memory (BLSTM) deep neural
networks (DNNs). We show that our fusion approach outperforms unimodal predictors. Our system performs 6-way simultaneous classification
and regression, allowing for overlapping emotion
labels in a video segment. This leads to an overall binary accuracy of 90%, overall 4-class accuracy of 89.2% and an overall mean-absolute-error
(MAE) of 0.12. Our work shows that an early fusion technique can effectively predict the presence
of multi-label emotions as well as their coarse grained intensities. The presented multimodal approach creates a simple and robust baseline on this
new Grand Challenge dataset. Furthermore, we
provide a detailed analysis of emotion intensity
distributions as output from our DNN, as well as
a related discussion concerning the inherent difficulty of this task.

This record has no associated files available for download.

More information

Published date: 1 July 2018

Identifiers

Local EPrints ID: 470338
URI: http://eprints.soton.ac.uk/id/eprint/470338
PURE UUID: c1498a59-76e5-4423-8c06-a31c03e3bd1f
ORCID for Jennifer Williams: ORCID iD orcid.org/0000-0003-1410-0427

Catalogue record

Date deposited: 06 Oct 2022 16:55
Last modified: 20 Jul 2024 02:07

Export record

Altmetrics

Contributors

Author: Jennifer Williams ORCID iD
Author: Steven Kleinegesse
Author: Ramona Comanescu
Author: Oana Radu

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×