Exploring Disentanglement with Multilingual and Monolingual VQ-VAE
Exploring Disentanglement with Multilingual and Monolingual VQ-VAE
This work examines the content and usefulness of disentangled
phone and speaker representations from two separately trained
VQ-VAE systems: one trained on multilingual data and another
trained on monolingual data. We explore the multi- and monolingual models using four small proof-of-concept tasks: copysynthesis, voice transformation, linguistic code-switching, and content-based privacy masking. From these tasks, we reflect on how disentangled phone and speaker representations can be
used to manipulate speech in a meaningful way. Our experiments demonstrate that the VQ representations are suitable for these tasks, including creating new voices by mixing speaker representations together. We also present our novel technique to conceal the content of targeted words within an utterance by
manipulating phone VQ codes, while retaining speaker identity and intelligibility of surrounding words. Finally, we discuss recommendations for further increasing the viability of disentangled representations
Williams, Jennifer
3a1568b4-8a0b-41d2-8635-14fe69fbb360
Fong, Jason
bb16be41-8533-43d6-b90f-ca252a0559ba
Cooper, Erica
f01163d1-971d-4ba0-af2c-b9e39fff4310
Yamagishi, Junichi
c2e5c9eb-b9f5-4881-bbd8-50ff4af6a620
28 August 2021
Williams, Jennifer
3a1568b4-8a0b-41d2-8635-14fe69fbb360
Fong, Jason
bb16be41-8533-43d6-b90f-ca252a0559ba
Cooper, Erica
f01163d1-971d-4ba0-af2c-b9e39fff4310
Yamagishi, Junichi
c2e5c9eb-b9f5-4881-bbd8-50ff4af6a620
Williams, Jennifer, Fong, Jason, Cooper, Erica and Yamagishi, Junichi
(2021)
Exploring Disentanglement with Multilingual and Monolingual VQ-VAE.
11th ISCA Speech Synthesis Workshop, , Budapest, Hungary.
26 - 28 Aug 2021.
Record type:
Conference or Workshop Item
(Paper)
Abstract
This work examines the content and usefulness of disentangled
phone and speaker representations from two separately trained
VQ-VAE systems: one trained on multilingual data and another
trained on monolingual data. We explore the multi- and monolingual models using four small proof-of-concept tasks: copysynthesis, voice transformation, linguistic code-switching, and content-based privacy masking. From these tasks, we reflect on how disentangled phone and speaker representations can be
used to manipulate speech in a meaningful way. Our experiments demonstrate that the VQ representations are suitable for these tasks, including creating new voices by mixing speaker representations together. We also present our novel technique to conceal the content of targeted words within an utterance by
manipulating phone VQ codes, while retaining speaker identity and intelligibility of surrounding words. Finally, we discuss recommendations for further increasing the viability of disentangled representations
This record has no associated files available for download.
More information
Published date: 28 August 2021
Venue - Dates:
11th ISCA Speech Synthesis Workshop, , Budapest, Hungary, 2021-08-26 - 2021-08-28
Identifiers
Local EPrints ID: 467441
URI: http://eprints.soton.ac.uk/id/eprint/467441
PURE UUID: be6be53e-9370-45b3-b5f5-9211a4a0b960
Catalogue record
Date deposited: 08 Jul 2022 16:40
Last modified: 17 Mar 2024 04:12
Export record
Contributors
Author:
Jennifer Williams
Author:
Jason Fong
Author:
Erica Cooper
Author:
Junichi Yamagishi
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics