Attacker attribution of audio deepfakes
Attacker attribution of audio deepfakes
Deepfakes are synthetically generated media often devised with malicious intent. They have become increasingly more convincing with large training datasets advanced neural networks. These fakes are readily being misused for slander, misinformation and fraud. For this reason, intensive research for developing countermeasures is also expanding. However, recent work is almost exclusively limited to deepfake detection - predicting if audio is real or fake. This is despite the fact that attribution (who created which fake?) is an essential building block of a larger defense strategy, as practiced in the field of cybersecurity for a long time. This paper considers the problem of deepfake attacker attribution in the domain of audio. We present several methods for creating attacker signatures using low-level acoustic descriptors and machine learning embeddings. We show that speech signal features are inadequate for characterizing attacker signatures. However, we also demonstrate that embeddings from a recurrent neural network can successfully characterize attacks from both known and unknown attackers. Our attack signature embeddings result in distinct clusters, both for seen and unseen audio deepfakes. We show that these embeddings can be used in downstream-tasks to high-effect, scoring 97.10% accuracy in attacker-id classification.
cs.CR, cs.LG, cs.SD
Müller, Nicolas M.
e054cb2d-3ad5-4674-b44e-406a6c2c1dfe
Dieckmann, Franziska
9bf042b0-fbfc-41cf-8844-b056b394902d
Williams, Jennifer
3a1568b4-8a0b-41d2-8635-14fe69fbb360
28 March 2022
Müller, Nicolas M.
e054cb2d-3ad5-4674-b44e-406a6c2c1dfe
Dieckmann, Franziska
9bf042b0-fbfc-41cf-8844-b056b394902d
Williams, Jennifer
3a1568b4-8a0b-41d2-8635-14fe69fbb360
[Unknown type: UNSPECIFIED]
Abstract
Deepfakes are synthetically generated media often devised with malicious intent. They have become increasingly more convincing with large training datasets advanced neural networks. These fakes are readily being misused for slander, misinformation and fraud. For this reason, intensive research for developing countermeasures is also expanding. However, recent work is almost exclusively limited to deepfake detection - predicting if audio is real or fake. This is despite the fact that attribution (who created which fake?) is an essential building block of a larger defense strategy, as practiced in the field of cybersecurity for a long time. This paper considers the problem of deepfake attacker attribution in the domain of audio. We present several methods for creating attacker signatures using low-level acoustic descriptors and machine learning embeddings. We show that speech signal features are inadequate for characterizing attacker signatures. However, we also demonstrate that embeddings from a recurrent neural network can successfully characterize attacks from both known and unknown attackers. Our attack signature embeddings result in distinct clusters, both for seen and unseen audio deepfakes. We show that these embeddings can be used in downstream-tasks to high-effect, scoring 97.10% accuracy in attacker-id classification.
Text
2203.15563v1
- Author's Original
More information
Published date: 28 March 2022
Additional Information:
Submitted to Insterspeech 2022
Keywords:
cs.CR, cs.LG, cs.SD
Identifiers
Local EPrints ID: 471724
URI: http://eprints.soton.ac.uk/id/eprint/471724
PURE UUID: 9d607292-883c-4543-8fbd-295cff05bbc1
Catalogue record
Date deposited: 17 Nov 2022 17:33
Last modified: 17 Mar 2024 04:12
Export record
Contributors
Author:
Nicolas M. Müller
Author:
Franziska Dieckmann
Author:
Jennifer Williams
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics