The University of Southampton
University of Southampton Institutional Repository

End-to-End Signal Factorization for Speech: Identity, Content, and Style

End-to-End Signal Factorization for Speech: Identity, Content, and Style
End-to-End Signal Factorization for Speech: Identity, Content, and Style
Preliminary experiments in this dissertation show that it is possible to factorize specific types of information from the speech signal in an abstract embedding space using machine learning. This information includes characteristics of the recording environment, speaking style, and speech quality. Based on these findings, a new technique is proposed to factorize multiple types of information from the speech signal simultaneously using a combination of state-of-the-art machine learning methods for speech processing. Successful speech signal factorization will lead to advances across many speech technologies, including improved speaker identification, detection of speech audio deep fakes, and controllable expression in speech synthesis.
5212-5213
Williams, Jennifer
3a1568b4-8a0b-41d2-8635-14fe69fbb360
Williams, Jennifer
3a1568b4-8a0b-41d2-8635-14fe69fbb360

Williams, Jennifer (2021) End-to-End Signal Factorization for Speech: Identity, Content, and Style. IJCAI'20 : Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, Yokohama, Yokohama, Japan. 11 - 17 Jul 2020. pp. 5212-5213 .

Record type: Conference or Workshop Item (Paper)

Abstract

Preliminary experiments in this dissertation show that it is possible to factorize specific types of information from the speech signal in an abstract embedding space using machine learning. This information includes characteristics of the recording environment, speaking style, and speech quality. Based on these findings, a new technique is proposed to factorize multiple types of information from the speech signal simultaneously using a combination of state-of-the-art machine learning methods for speech processing. Successful speech signal factorization will lead to advances across many speech technologies, including improved speaker identification, detection of speech audio deep fakes, and controllable expression in speech synthesis.

This record has no associated files available for download.

More information

Published date: 8 December 2021
Additional Information: ISBN 9780999241165
Venue - Dates: IJCAI'20 : Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, Yokohama, Yokohama, Japan, 2020-07-11 - 2020-07-17

Identifiers

Local EPrints ID: 467426
URI: http://eprints.soton.ac.uk/id/eprint/467426
PURE UUID: e775b177-0e99-4a95-90dc-8f93feccb4bd
ORCID for Jennifer Williams: ORCID iD orcid.org/0000-0003-1410-0427

Catalogue record

Date deposited: 08 Jul 2022 16:33
Last modified: 23 Feb 2023 03:27

Export record

Contributors

Author: Jennifer Williams ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×