Attacker attribution of audio deepfakes
Attacker attribution of audio deepfakes
Deepfakes are synthetically generated media often devised with malicious intent. They have become increasingly more convincing with large training datasets advanced neural networks. These fakes are readily being misused for slander, misinformation and fraud. For this reason, intensive research for developing countermeasures is also expanding. However, recent work is almost exclusively limited to deepfake detection - predicting if audio is real or fake. This is despite the fact that attribution (who created which fake?) is an essential building block of a larger defense strategy, as practiced in the field of cybersecurity for a long time. This paper considers the problem of deepfake attacker attribution in the domain of audio. We present several methods for creating attacker signatures using low-level acoustic descriptors and machine learning embeddings. We show that speech signal features are inadequate for characterizing attacker signatures. However, we also demonstrate that embeddings from a recurrent neural network can successfully characterize attacks from both known and unknown attackers. Our attack signature embeddings result in distinct clusters, both for seen and unseen audio deepfakes. We show that these embeddings can be used in downstream-tasks to high-effect, scoring 97.10% accuracy in attacker-id classification.
2788-2792
Williams, Jennifer
3a1568b4-8a0b-41d2-8635-14fe69fbb360
18 September 2022
Williams, Jennifer
3a1568b4-8a0b-41d2-8635-14fe69fbb360
Record type:
Conference or Workshop Item
(Paper)
Abstract
Deepfakes are synthetically generated media often devised with malicious intent. They have become increasingly more convincing with large training datasets advanced neural networks. These fakes are readily being misused for slander, misinformation and fraud. For this reason, intensive research for developing countermeasures is also expanding. However, recent work is almost exclusively limited to deepfake detection - predicting if audio is real or fake. This is despite the fact that attribution (who created which fake?) is an essential building block of a larger defense strategy, as practiced in the field of cybersecurity for a long time. This paper considers the problem of deepfake attacker attribution in the domain of audio. We present several methods for creating attacker signatures using low-level acoustic descriptors and machine learning embeddings. We show that speech signal features are inadequate for characterizing attacker signatures. However, we also demonstrate that embeddings from a recurrent neural network can successfully characterize attacks from both known and unknown attackers. Our attack signature embeddings result in distinct clusters, both for seen and unseen audio deepfakes. We show that these embeddings can be used in downstream-tasks to high-effect, scoring 97.10% accuracy in attacker-id classification.
Text
muller22b_interspeech
- Version of Record
Restricted to Repository staff only
Request a copy
More information
Published date: 18 September 2022
Venue - Dates:
Interspeech 2022, Interspeech, Incheon, Korea, Republic of, 2022-09-18 - 2022-09-22
Identifiers
Local EPrints ID: 501848
URI: http://eprints.soton.ac.uk/id/eprint/501848
ISSN: 2958-1796
PURE UUID: 6fb978e2-5f24-47e5-8eed-298d048c9382
Catalogue record
Date deposited: 11 Jun 2025 16:31
Last modified: 22 Aug 2025 02:34
Export record
Altmetrics
Contributors
Author:
Jennifer Williams
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics