Privacy-Preserving Occupancy Estimation

In this paper, we introduce an audio-based framework for occupancy estimation, including a new public dataset, and evaluate occupancy in a ‘cocktail party’ scenario where the party is simulated by mixing audio to produce speech with overlapping talkers (1-10 people). To estimate the number of speakers in an audio clip, we explored five different types of speech signal features and trained several versions of our model using convolutional neural networks (CNNs). Further, we adapted the framework to be privacy-preserving by making random perturbations of audio frames in order to conceal speech content and speaker identity. We show that some of our privacy-preserving features perform better at occupancy estimation than original waveforms. We analyse privacy further using two adversarial tasks: speaker recognition and speech recognition. Our privacy-preserving models can estimate the number of speakers in the simulated cocktail party clips within 1-2 persons based on a mean-square error (MSE) of 0.9-1.6 and we achieve up to 34.9% classification accuracy while preserving speech content privacy. However, it is still possible for an attacker to identify individual speakers, which motivates further work in this area.

10.1109/ICASSP49357.2023.10095340

247–258

IEEE

Williams, Jennifer

3a1568b4-8a0b-41d2-8635-14fe69fbb360

Yazdanpanah, Vahid

28f82058-5e51-4f56-be14-191ab5767d56

Stein, Sebastian

cb2325e7-5e63-475e-8a69-9db2dfbdb00b