A new approach to voice authenticity
A new approach to voice authenticity
Voice faking poses significant societal challenges. Currently, the prevailing assumption is that unaltered human speech can always be considered genuine, while fake speech usually comes from text-to-speech (TTS) synthesis. We argue that this type of binary distinction is oversimplified. For instance, altered playback speeds can maliciously deceive listeners, as in the ‘Drunken Nancy Pelosi’ incident. Similarly, editing of audio clips can be done ethically, e.g. for brevity or summarization in news reporting or podcasts, but editing can also create misleading narratives. In this paper, we propose a conceptual shift away from the longstanding binary paradigm of speech audio being either ‘fake’ or ‘real’. Instead, we focus on pinpointing ‘voice edits’, which encompass traditional modifications like filters and cuts, as well as neural synthesis. We delineate six categories of voice edits and curate a new challenge dataset, for which we present baseline voice edit detection systems.
Müller, Nicolas M.
e054cb2d-3ad5-4674-b44e-406a6c2c1dfe
Kawa, Piotr
fece3d41-4ee8-465e-b2f2-51853b983609
Hu, Shen
7e195648-d116-445c-b9ba-73ea177ac7be
Neu, Matthias
1d77c078-4a65-4dfa-8b2a-d1f950e64d65
Williams, Jennifer
3a1568b4-8a0b-41d2-8635-14fe69fbb360
Sperl, Philip
2d9a03d7-ae76-4c3a-bf9e-96d3fb99560d
Böttinger, Konstantin
ec031c04-8af1-411a-871b-e31201458053
1 September 2024
Müller, Nicolas M.
e054cb2d-3ad5-4674-b44e-406a6c2c1dfe
Kawa, Piotr
fece3d41-4ee8-465e-b2f2-51853b983609
Hu, Shen
7e195648-d116-445c-b9ba-73ea177ac7be
Neu, Matthias
1d77c078-4a65-4dfa-8b2a-d1f950e64d65
Williams, Jennifer
3a1568b4-8a0b-41d2-8635-14fe69fbb360
Sperl, Philip
2d9a03d7-ae76-4c3a-bf9e-96d3fb99560d
Böttinger, Konstantin
ec031c04-8af1-411a-871b-e31201458053
Müller, Nicolas M., Kawa, Piotr, Hu, Shen, Neu, Matthias, Williams, Jennifer, Sperl, Philip and Böttinger, Konstantin
(2024)
A new approach to voice authenticity.
Interspeech 2024, Kos Island, Greece.
01 - 05 Sep 2024.
5 pp
.
Record type:
Conference or Workshop Item
(Paper)
Abstract
Voice faking poses significant societal challenges. Currently, the prevailing assumption is that unaltered human speech can always be considered genuine, while fake speech usually comes from text-to-speech (TTS) synthesis. We argue that this type of binary distinction is oversimplified. For instance, altered playback speeds can maliciously deceive listeners, as in the ‘Drunken Nancy Pelosi’ incident. Similarly, editing of audio clips can be done ethically, e.g. for brevity or summarization in news reporting or podcasts, but editing can also create misleading narratives. In this paper, we propose a conceptual shift away from the longstanding binary paradigm of speech audio being either ‘fake’ or ‘real’. Instead, we focus on pinpointing ‘voice edits’, which encompass traditional modifications like filters and cuts, as well as neural synthesis. We delineate six categories of voice edits and curate a new challenge dataset, for which we present baseline voice edit detection systems.
Text
muller24_interspeech
- Accepted Manuscript
More information
Published date: 1 September 2024
Venue - Dates:
Interspeech 2024, Kos Island, Greece, 2024-09-01 - 2024-09-05
Identifiers
Local EPrints ID: 502676
URI: http://eprints.soton.ac.uk/id/eprint/502676
PURE UUID: 01d256ee-ee75-4fa0-b8f0-7be4a9f4d473
Catalogue record
Date deposited: 04 Jul 2025 16:34
Last modified: 22 Aug 2025 02:34
Export record
Contributors
Author:
Nicolas M. Müller
Author:
Piotr Kawa
Author:
Shen Hu
Author:
Matthias Neu
Author:
Jennifer Williams
Author:
Philip Sperl
Author:
Konstantin Böttinger
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics