Privacy-Preserving Occupancy Estimation
Privacy-Preserving Occupancy Estimation
In this paper, we introduce an audio-based framework for occupancy estimation, including a new public dataset, and evaluate occupancy in a ‘cocktail party’ scenario where the party is simulated by mixing audio to produce speech with overlapping talkers (1-10 people). To estimate the number of speakers in an audio clip, we explored five different types of speech signal features and trained several versions of our model using convolutional neural networks (CNNs). Further, we adapted the framework to be privacy-preserving by making random perturbations of audio frames in order to conceal speech content and speaker identity. We show that some of our privacy-preserving features perform better at occupancy estimation than original waveforms. We analyse privacy further using two adversarial tasks: speaker recognition and speech recognition. Our privacy-preserving models can estimate the number of speakers in the simulated cocktail party clips within 1-2 persons based on a mean-square error (MSE) of 0.9-1.6 and we achieve up to 34.9% classification accuracy while preserving speech content privacy. However, it is still possible for an attacker to identify individual speakers, which motivates further work in this area.
247–258
Williams, Jennifer
3a1568b4-8a0b-41d2-8635-14fe69fbb360
Yazdanpanah, Vahid
28f82058-5e51-4f56-be14-191ab5767d56
Stein, Sebastian
cb2325e7-5e63-475e-8a69-9db2dfbdb00b
5 May 2023
Williams, Jennifer
3a1568b4-8a0b-41d2-8635-14fe69fbb360
Yazdanpanah, Vahid
28f82058-5e51-4f56-be14-191ab5767d56
Stein, Sebastian
cb2325e7-5e63-475e-8a69-9db2dfbdb00b
Williams, Jennifer, Yazdanpanah, Vahid and Stein, Sebastian
(2023)
Privacy-Preserving Occupancy Estimation.
In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP).
IEEE.
.
(doi:10.1109/ICASSP49357.2023.10095340).
Record type:
Conference or Workshop Item
(Paper)
Abstract
In this paper, we introduce an audio-based framework for occupancy estimation, including a new public dataset, and evaluate occupancy in a ‘cocktail party’ scenario where the party is simulated by mixing audio to produce speech with overlapping talkers (1-10 people). To estimate the number of speakers in an audio clip, we explored five different types of speech signal features and trained several versions of our model using convolutional neural networks (CNNs). Further, we adapted the framework to be privacy-preserving by making random perturbations of audio frames in order to conceal speech content and speaker identity. We show that some of our privacy-preserving features perform better at occupancy estimation than original waveforms. We analyse privacy further using two adversarial tasks: speaker recognition and speech recognition. Our privacy-preserving models can estimate the number of speakers in the simulated cocktail party clips within 1-2 persons based on a mean-square error (MSE) of 0.9-1.6 and we achieve up to 34.9% classification accuracy while preserving speech content privacy. However, it is still possible for an attacker to identify individual speakers, which motivates further work in this area.
Text
2023053433
- Accepted Manuscript
Restricted to Repository staff only until 5 May 2025.
Request a copy
More information
Accepted/In Press date: 17 February 2023
Published date: 5 May 2023
Identifiers
Local EPrints ID: 475809
URI: http://eprints.soton.ac.uk/id/eprint/475809
PURE UUID: 15adc7e0-9561-4194-89ea-07a9ec6824f0
Catalogue record
Date deposited: 28 Mar 2023 18:32
Last modified: 17 Mar 2024 04:12
Export record
Altmetrics
Contributors
Author:
Jennifer Williams
Author:
Vahid Yazdanpanah
Author:
Sebastian Stein
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics