The University of Southampton
University of Southampton Institutional Repository

Speech replay detection with x-Vector Attack Embeddings and spectral features

Speech replay detection with x-Vector Attack Embeddings and spectral features
Speech replay detection with x-Vector Attack Embeddings and spectral features
We present our system submission to the ASVspoof 2019 Challenge Physical Access (PA) task. The objective for this challenge was to develop a countermeasure that identifies speech audio as either bona fide or intercepted and replayed. The target prediction was a value indicating that a speech segment was bona fide (positive values) or “spoofed” (negative values). Our system used convolutional neural networks (CNNs) and a representation of the speech audio that combined x-vector attack embeddings with signal processing features. The x-vector attack embeddings were created from mel-frequency cepstral coefficients (MFCCs) using a time-delay neural network (TDNN). These embeddings jointly modeled 27 different environments and 9 types of attacks from the labeled data. We also used sub-band spectral centroid magnitude coefficients (SCMCs) as features. We included an additive Gaussian noise layer during training as a way to augment the data to make our system more robust to previously unseen attack examples. We report system performance using the tandem detection cost function (tDCF) and equal error rate (EER). Our approach performed better that both of the challenge baselines. Our technique suggests that our x-vector attack embeddings can help regularize the CNN predictions even when environments or attacks are more challenging.
1053-1057
Williams, Jennifer
3a1568b4-8a0b-41d2-8635-14fe69fbb360
Rownicka, Joanna
73b0f5ec-36a7-4774-a957-7c4a9d6b6aa1
Williams, Jennifer
3a1568b4-8a0b-41d2-8635-14fe69fbb360
Rownicka, Joanna
73b0f5ec-36a7-4774-a957-7c4a9d6b6aa1

Williams, Jennifer and Rownicka, Joanna (2019) Speech replay detection with x-Vector Attack Embeddings and spectral features. Interspeech 2019, , Graz, Austria. 15 - 19 Sep 2019. pp. 1053-1057 . (doi:10.21437/Interspeech.2019-1760).

Record type: Conference or Workshop Item (Paper)

Abstract

We present our system submission to the ASVspoof 2019 Challenge Physical Access (PA) task. The objective for this challenge was to develop a countermeasure that identifies speech audio as either bona fide or intercepted and replayed. The target prediction was a value indicating that a speech segment was bona fide (positive values) or “spoofed” (negative values). Our system used convolutional neural networks (CNNs) and a representation of the speech audio that combined x-vector attack embeddings with signal processing features. The x-vector attack embeddings were created from mel-frequency cepstral coefficients (MFCCs) using a time-delay neural network (TDNN). These embeddings jointly modeled 27 different environments and 9 types of attacks from the labeled data. We also used sub-band spectral centroid magnitude coefficients (SCMCs) as features. We included an additive Gaussian noise layer during training as a way to augment the data to make our system more robust to previously unseen attack examples. We report system performance using the tandem detection cost function (tDCF) and equal error rate (EER). Our approach performed better that both of the challenge baselines. Our technique suggests that our x-vector attack embeddings can help regularize the CNN predictions even when environments or attacks are more challenging.

This record has no associated files available for download.

More information

Published date: 19 September 2019
Venue - Dates: Interspeech 2019, , Graz, Austria, 2019-09-15 - 2019-09-19

Identifiers

Local EPrints ID: 467463
URI: http://eprints.soton.ac.uk/id/eprint/467463
PURE UUID: 76c7c58a-948e-4c9b-87d4-c28188c4ba14
ORCID for Jennifer Williams: ORCID iD orcid.org/0000-0003-1410-0427

Catalogue record

Date deposited: 08 Jul 2022 16:50
Last modified: 17 Mar 2024 04:12

Export record

Altmetrics

Contributors

Author: Jennifer Williams ORCID iD
Author: Joanna Rownicka

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×