The University of Southampton
University of Southampton Institutional Repository

Exploring Disentanglement with Multilingual and Monolingual VQ-VAE

Exploring Disentanglement with Multilingual and Monolingual VQ-VAE
Exploring Disentanglement with Multilingual and Monolingual VQ-VAE
This work examines the content and usefulness of disentangled
phone and speaker representations from two separately trained
VQ-VAE systems: one trained on multilingual data and another
trained on monolingual data. We explore the multi- and monolingual models using four small proof-of-concept tasks: copysynthesis, voice transformation, linguistic code-switching, and content-based privacy masking. From these tasks, we reflect on how disentangled phone and speaker representations can be
used to manipulate speech in a meaningful way. Our experiments demonstrate that the VQ representations are suitable for these tasks, including creating new voices by mixing speaker representations together. We also present our novel technique to conceal the content of targeted words within an utterance by
manipulating phone VQ codes, while retaining speaker identity and intelligibility of surrounding words. Finally, we discuss recommendations for further increasing the viability of disentangled representations
Williams, Jennifer
3a1568b4-8a0b-41d2-8635-14fe69fbb360
Fong, Jason
bb16be41-8533-43d6-b90f-ca252a0559ba
Cooper, Erica
f01163d1-971d-4ba0-af2c-b9e39fff4310
Yamagishi, Junichi
c2e5c9eb-b9f5-4881-bbd8-50ff4af6a620
Williams, Jennifer
3a1568b4-8a0b-41d2-8635-14fe69fbb360
Fong, Jason
bb16be41-8533-43d6-b90f-ca252a0559ba
Cooper, Erica
f01163d1-971d-4ba0-af2c-b9e39fff4310
Yamagishi, Junichi
c2e5c9eb-b9f5-4881-bbd8-50ff4af6a620

Williams, Jennifer, Fong, Jason, Cooper, Erica and Yamagishi, Junichi (2021) Exploring Disentanglement with Multilingual and Monolingual VQ-VAE. 11th ISCA Speech Synthesis Workshop, , Budapest, Hungary. 26 - 28 Aug 2021.

Record type: Conference or Workshop Item (Paper)

Abstract

This work examines the content and usefulness of disentangled
phone and speaker representations from two separately trained
VQ-VAE systems: one trained on multilingual data and another
trained on monolingual data. We explore the multi- and monolingual models using four small proof-of-concept tasks: copysynthesis, voice transformation, linguistic code-switching, and content-based privacy masking. From these tasks, we reflect on how disentangled phone and speaker representations can be
used to manipulate speech in a meaningful way. Our experiments demonstrate that the VQ representations are suitable for these tasks, including creating new voices by mixing speaker representations together. We also present our novel technique to conceal the content of targeted words within an utterance by
manipulating phone VQ codes, while retaining speaker identity and intelligibility of surrounding words. Finally, we discuss recommendations for further increasing the viability of disentangled representations

This record has no associated files available for download.

More information

Published date: 28 August 2021
Venue - Dates: 11th ISCA Speech Synthesis Workshop, , Budapest, Hungary, 2021-08-26 - 2021-08-28

Identifiers

Local EPrints ID: 467441
URI: http://eprints.soton.ac.uk/id/eprint/467441
PURE UUID: be6be53e-9370-45b3-b5f5-9211a4a0b960
ORCID for Jennifer Williams: ORCID iD orcid.org/0000-0003-1410-0427

Catalogue record

Date deposited: 08 Jul 2022 16:40
Last modified: 17 Mar 2024 04:12

Export record

Contributors

Author: Jennifer Williams ORCID iD
Author: Jason Fong
Author: Erica Cooper
Author: Junichi Yamagishi

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×