On the latency of voice conversion in an active voice cloning device
On the latency of voice conversion in an active voice cloning device
Although significant progress has been made in voice conversion (VC), the presence of the source speaker’s original voice becomes problematic, especially in a real-time voice cloning scenario where listeners are close to the source speaker. An overlapping of the source speaker’s voice and the converted voice results in a degradation of the sense of immersion to the converted voice. In this paper, we conceptualize an active voice cloning (AVC) device, which can convert one’s voice timbre to another’s while confining the source speaker’s voice with active noise control (ANC). The VC system is realized through a low-latency deep neural network model, and the ANC system is constructed by a feedforward single-channel implementation. The mockup of the AVC device is assembled in a short open tube that can be worn on the source speaker’s mouth. Taking into consideration that the latency in the VC system introduces a phase difference between the converted voice and the residual voice of the source speaker, we further assess its effect on the intelligibility of the converted voice, as well as
the overall performance of the AVC device in ameliorating the perceptual experience.
International Commission for Acoustics
Irihose, Obed
3000079b-9cbf-4acb-b08f-085d450e96a2
Xie, Rong
c236a271-fe47-4fdb-b1ed-2598ef36ed4d
Shi, Chuang
c46f72bd-54c7-45ee-ac5d-285691fccf81
24 October 2022
Irihose, Obed
3000079b-9cbf-4acb-b08f-085d450e96a2
Xie, Rong
c236a271-fe47-4fdb-b1ed-2598ef36ed4d
Shi, Chuang
c46f72bd-54c7-45ee-ac5d-285691fccf81
Irihose, Obed, Xie, Rong and Shi, Chuang
(2022)
On the latency of voice conversion in an active voice cloning device.
In 24th International Congress on Acoustics Proceedings.
International Commission for Acoustics.
7 pp
.
Record type:
Conference or Workshop Item
(Paper)
Abstract
Although significant progress has been made in voice conversion (VC), the presence of the source speaker’s original voice becomes problematic, especially in a real-time voice cloning scenario where listeners are close to the source speaker. An overlapping of the source speaker’s voice and the converted voice results in a degradation of the sense of immersion to the converted voice. In this paper, we conceptualize an active voice cloning (AVC) device, which can convert one’s voice timbre to another’s while confining the source speaker’s voice with active noise control (ANC). The VC system is realized through a low-latency deep neural network model, and the ANC system is constructed by a feedforward single-channel implementation. The mockup of the AVC device is assembled in a short open tube that can be worn on the source speaker’s mouth. Taking into consideration that the latency in the VC system introduces a phase difference between the converted voice and the residual voice of the source speaker, we further assess its effect on the intelligibility of the converted voice, as well as
the overall performance of the AVC device in ameliorating the perceptual experience.
Text
ICA24_VC_Submission_ABS-0679
- Accepted Manuscript
Restricted to Repository staff only
Request a copy
More information
Published date: 24 October 2022
Venue - Dates:
24th International Congress on Acoustics, Hwabaek International Convention Center, Gyeongju, Korea, Republic of, 2022-10-24 - 2022-10-28
Identifiers
Local EPrints ID: 484696
URI: http://eprints.soton.ac.uk/id/eprint/484696
PURE UUID: 69447eb2-ba55-4cfc-82ce-2371b33c5c8f
Catalogue record
Date deposited: 20 Nov 2023 17:43
Last modified: 18 Mar 2024 04:13
Export record
Contributors
Author:
Obed Irihose
Author:
Rong Xie
Author:
Chuang Shi
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics