The University of Southampton

×

Augmenting pre-trained language models with audio feature embedding for argumentation mining in political debates

Augmenting pre-trained language models with audio feature embedding for argumentation mining in political debates

Augmenting pre-trained language models with audio feature embedding for argumentation mining in political debates

The integration of multimodality in natural language processing (NLP) tasks seeks to exploit the complementary information contained in two or more modalities, such as text, audio and video. This paper investigates the integration of often under-researched audio features with text, using the task of argumentation mining (AM) as a case study. We take a previously reported dataset and present an audio-enhanced version (the Multimodal USElecDeb60To16 dataset). We report the performance of two text models based on BERT and GloVe embeddings, one audio model (based on CNN and Bi-LSTM) and multimodal combinations, on a dataset of 28,850 utterances. The results show that multimodal models do not outperform text-based models when using the full dataset. However, we show that audio features add value in fully supervised scenarios with limited data. We find that when data is scarce (e.g. with 10% of the original dataset) multimodal models yield improved performance, whereas text models based on BERT considerably decrease performance. Finally, we conduct a study with artificially generated voices and an ablation study to investigate the importance of different audio features in the audio models.

Mestre, Rafael

33721a01-ab1a-4f71-8b0e-abef8afc92f3

Middleton, Stuart E.

404b62ba-d77e-476b-9775-32645b04473f

Ryan, Matt

f07cd3e8-f3d9-4681-9091-84c2df07cd54

Gheasi, Masood

0e1a0af4-3f82-4498-a5e5-4f7f7618d68e

Norman, Timothy

663e522f-807c-4569-9201-dc141c8eb50d

Zhu, Jiatong

52569115-5d72-4fc0-8876-a66b991ed209

17 March 2023

Mestre, Rafael

33721a01-ab1a-4f71-8b0e-abef8afc92f3

Middleton, Stuart E.

404b62ba-d77e-476b-9775-32645b04473f

Ryan, Matt

f07cd3e8-f3d9-4681-9091-84c2df07cd54

Gheasi, Masood

0e1a0af4-3f82-4498-a5e5-4f7f7618d68e

Norman, Timothy

663e522f-807c-4569-9201-dc141c8eb50d

Zhu, Jiatong

52569115-5d72-4fc0-8876-a66b991ed209

Mestre, Rafael, Middleton, Stuart E., Ryan, Matt, Gheasi, Masood, Norman, Timothy and Zhu, Jiatong (2023) Augmenting pre-trained language models with audio feature embedding for argumentation mining in political debates. In Findings of the 17th conference on European Chapter of the Association for Computational Linguistics (EACL).

Record type: Conference or Workshop Item (Paper)

Abstract

The integration of multimodality in natural language processing (NLP) tasks seeks to exploit the complementary information contained in two or more modalities, such as text, audio and video. This paper investigates the integration of often under-researched audio features with text, using the task of argumentation mining (AM) as a case study. We take a previously reported dataset and present an audio-enhanced version (the Multimodal USElecDeb60To16 dataset). We report the performance of two text models based on BERT and GloVe embeddings, one audio model (based on CNN and Bi-LSTM) and multimodal combinations, on a dataset of 28,850 utterances. The results show that multimodal models do not outperform text-based models when using the full dataset. However, we show that audio features add value in fully supervised scenarios with limited data. We find that when data is scarce (e.g. with 10% of the original dataset) multimodal models yield improved performance, whereas text models based on BERT considerably decrease performance. Finally, we conduct a study with artificially generated voices and an ablation study to investigate the importance of different audio features in the audio models.

Text

2023.findings-eacl.21 - Version of Record

Available under License Creative Commons Attribution.

Download (439kB)

Text

mestre_2023_MultimodalUSElecDeb60to16

Available under License Creative Commons Attribution Share Alike.

Download (437kB)

More information

Published date: 17 March 2023

Related URLs:

https://aclanthology.org/2023....s-eacl.21/

Learn more about Agents, Interactions and Complexity research Learn more about School of Electronics and Computer Science research

Identifiers

Local EPrints ID: 475962

URI: http://eprints.soton.ac.uk/id/eprint/475962

PURE UUID: ecbdf994-e027-4fd8-8476-fc98b6c2b383

ORCID for Rafael Mestre:

orcid.org/0000-0002-2460-4234

ORCID for Stuart E. Middleton:

orcid.org/0000-0001-8305-8176

ORCID for Matt Ryan:

orcid.org/0000-0002-8693-5063

ORCID for Timothy Norman:

orcid.org/0000-0002-6387-4034

Catalogue record

Date deposited: 03 Apr 2023 16:33

Last modified: 17 Mar 2024 04:06

Export record

Share this record

Share this on Facebook Share this on Twitter Share this on Weibo

Contributors

Author: Rafael Mestre

Author: Stuart E. Middleton

Author: Matt Ryan

Author: Masood Gheasi

Author: Timothy Norman

Author: Jiatong Zhu

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

Loading...

View more statistics

Library staff additional information

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

⇧ Back to top

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×