The University of Southampton
University of Southampton Institutional Repository

Disentangling style factors from speaker representations

Disentangling style factors from speaker representations
Disentangling style factors from speaker representations
Our goal is to separate out speaking style from speaker identity in utterance-level representations of speech such as i-vectors and x-vectors. We first show that both i-vectors and x-vectors contain information not only about speaker but also about speaking style (for one data set) or emotion (for another data set), even when projected into a low-dimensional space. To disentangle these factors, we use an autoencoder in which the latent space is split into two subspaces. The entangled information about speaker and style/emotion is pushed apart by the use of auxiliary classifiers that take one of the two latent subspaces as input and that are jointly learned with the autoencoder. We evaluate how well the latent subspaces separate the factors by using them as input to separate style/emotion classification tasks. In traditional speaker identification tasks, speaker-invariant characteristics are factorized from channel and then the channel information is ignored. Our results suggest that this so-called channel may contain exploitable information, which we refer to as style factors. Finally, we propose future work to use information theory to formalize style factors in the context of speaker identity.
3945-3949
Williams, Jennifer
3a1568b4-8a0b-41d2-8635-14fe69fbb360
King, Simon
ddf6b68a-e917-4ed9-b8ed-80608d89f113
Williams, Jennifer
3a1568b4-8a0b-41d2-8635-14fe69fbb360
King, Simon
ddf6b68a-e917-4ed9-b8ed-80608d89f113

Williams, Jennifer and King, Simon (2019) Disentangling style factors from speaker representations. Interspeech 2019, , Graz, Austria. 15 - 19 Sep 2019. pp. 3945-3949 .

Record type: Conference or Workshop Item (Paper)

Abstract

Our goal is to separate out speaking style from speaker identity in utterance-level representations of speech such as i-vectors and x-vectors. We first show that both i-vectors and x-vectors contain information not only about speaker but also about speaking style (for one data set) or emotion (for another data set), even when projected into a low-dimensional space. To disentangle these factors, we use an autoencoder in which the latent space is split into two subspaces. The entangled information about speaker and style/emotion is pushed apart by the use of auxiliary classifiers that take one of the two latent subspaces as input and that are jointly learned with the autoencoder. We evaluate how well the latent subspaces separate the factors by using them as input to separate style/emotion classification tasks. In traditional speaker identification tasks, speaker-invariant characteristics are factorized from channel and then the channel information is ignored. Our results suggest that this so-called channel may contain exploitable information, which we refer to as style factors. Finally, we propose future work to use information theory to formalize style factors in the context of speaker identity.

This record has no associated files available for download.

More information

Published date: 19 September 2019
Venue - Dates: Interspeech 2019, , Graz, Austria, 2019-09-15 - 2019-09-19

Identifiers

Local EPrints ID: 467453
URI: http://eprints.soton.ac.uk/id/eprint/467453
PURE UUID: 25771078-5a48-41c2-9e7e-6243d5ba6303
ORCID for Jennifer Williams: ORCID iD orcid.org/0000-0003-1410-0427

Catalogue record

Date deposited: 08 Jul 2022 16:44
Last modified: 17 Mar 2024 04:12

Export record

Contributors

Author: Jennifer Williams ORCID iD
Author: Simon King

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×