The University of Southampton
University of Southampton Institutional Repository

Potential and pitfalls of audio as data for political research: alignment, features, and classification models

Potential and pitfalls of audio as data for political research: alignment, features, and classification models
Potential and pitfalls of audio as data for political research: alignment, features, and classification models
Political science is a field rich in multimodal information sources, from televised debates to parliamentary briefings. This paper bridges a gap between computer and political science in multimodal data analysis using audio. The adoption of multimodal analyses in political science (e.g., video/audio with text-as-data approaches) has been relatively slow due to unequal distribution of computational power and skills needed. We provide solutions to challenges encountered when analyzing audio, advancing the potential for multimodal data analysis in political science. Using a dataset of all televised U.S. presidential debates from 1960 to 2020, we focus on three features encountered when analyzing audio data: low-level descriptors (LLDs), such as pitch or energy; Mel-frequency cepstral coefficients (MFCCs); and audio embeddings/encodings, like Wav2Vec. We showcase four applications: (a) forced alignment of audio text using MFCCs, time-stamping transcripts, and speaker information; (b) speech characterization using LLDs; (c) custom-made classification models with audio embeddings and MFCCs; and (d) emotional recognition models using Wav2Vec for classification of discrete emotions and their valence-arousal dominance. We provide explanations to help understand how these features can be applied for different political research questions and advice on vigilance to naive interpretation, for both experienced researchers and those who want to start working with audio.
analysis of political speech, computational methods, machine learning
Mestre, Rafael
33721a01-ab1a-4f71-8b0e-abef8afc92f3
Ryan, Matt
f07cd3e8-f3d9-4681-9091-84c2df07cd54
Mestre, Rafael
33721a01-ab1a-4f71-8b0e-abef8afc92f3
Ryan, Matt
f07cd3e8-f3d9-4681-9091-84c2df07cd54

Mestre, Rafael and Ryan, Matt (2026) Potential and pitfalls of audio as data for political research: alignment, features, and classification models. Political Analysis. (doi:10.1017/pan.2025.10031).

Record type: Article

Abstract

Political science is a field rich in multimodal information sources, from televised debates to parliamentary briefings. This paper bridges a gap between computer and political science in multimodal data analysis using audio. The adoption of multimodal analyses in political science (e.g., video/audio with text-as-data approaches) has been relatively slow due to unequal distribution of computational power and skills needed. We provide solutions to challenges encountered when analyzing audio, advancing the potential for multimodal data analysis in political science. Using a dataset of all televised U.S. presidential debates from 1960 to 2020, we focus on three features encountered when analyzing audio data: low-level descriptors (LLDs), such as pitch or energy; Mel-frequency cepstral coefficients (MFCCs); and audio embeddings/encodings, like Wav2Vec. We showcase four applications: (a) forced alignment of audio text using MFCCs, time-stamping transcripts, and speaker information; (b) speech characterization using LLDs; (c) custom-made classification models with audio embeddings and MFCCs; and (d) emotional recognition models using Wav2Vec for classification of discrete emotions and their valence-arousal dominance. We provide explanations to help understand how these features can be applied for different political research questions and advice on vigilance to naive interpretation, for both experienced researchers and those who want to start working with audio.

Text
potential-and-pitfalls-of-audio-as-data-for-political-research-alignment-features-and-classification-models - Version of Record
Available under License Creative Commons Attribution.
Download (2MB)

More information

e-pub ahead of print date: 30 January 2026
Keywords: analysis of political speech, computational methods, machine learning

Identifiers

Local EPrints ID: 510370
URI: http://eprints.soton.ac.uk/id/eprint/510370
PURE UUID: 03d01919-93ef-4ee0-8270-e42e84ef6f7d
ORCID for Rafael Mestre: ORCID iD orcid.org/0000-0002-2460-4234
ORCID for Matt Ryan: ORCID iD orcid.org/0000-0002-8693-5063

Catalogue record

Date deposited: 27 Mar 2026 17:50
Last modified: 28 Mar 2026 03:04

Export record

Altmetrics

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×