The University of Southampton
University of Southampton Institutional Repository

One-for-all: towards universal domain translation with a single StyleGAN

One-for-all: towards universal domain translation with a single StyleGAN
One-for-all: towards universal domain translation with a single StyleGAN

In this paper, we propose a novel translation model, UniTranslator, for transforming representations between visually distinct domains under conditions of limited training data and significant visual differences. The main idea behind our approach is leveraging the domain-neutral capabilities of CLIP as a bridging mechanism, while utilizing a separate module to extract abstract, domain-agnostic semantics from the embeddings of both the source and target realms. Fusing these abstract semantics with target-specific semantics results in a transformed embedding within the CLIP space. To bridge the gap between the disparate worlds of CLIP and StyleGAN, we introduce a new non-linear mapper, the CLIP2P mapper. Utilizing CLIP embeddings, this module is tailored to approximate the latent distribution in the StyleGAN's latent space, effectively acting as a connector between these two spaces. The proposed UniTranslator is versatile and capable of performing various tasks, including style mixing, stylization, and translations, even in visually challenging scenarios across different visual domains. Notably, UniTranslator generates high-quality translations that showcase domain relevance, diversity, and improved image quality. UniTranslator surpasses the performance of existing general-purpose models and performs well against specialized models in representative tasks. The source code and trained models will be released to the public.

GAN Embedding, Generative Adversarial Networks, Image-to-Image Translation
1939-3539
2865-2881
Du, Yong
5ae897d7-a3db-45d7-896a-187afa95ac43
Zhan, Jiahui
d8907c4a-dc52-41d5-b4ef-e99a7c2888b0
Li, Xinzhe
e38f78c4-0b8f-45ab-b59e-b3991d1f18d8
Dong, Junyu
f412ed20-b213-4c97-b0fd-f77262d6ab2f
Chen, Sheng
9310a111-f79a-48b8-98c7-383ca93cbb80
Yang, Ming-Hsuan
284704e5-cdfa-412a-a308-c739d6363989
He, Shengfeng
4abf3d21-503a-4e50-8b7c-a3eeabe60189
Du, Yong
5ae897d7-a3db-45d7-896a-187afa95ac43
Zhan, Jiahui
d8907c4a-dc52-41d5-b4ef-e99a7c2888b0
Li, Xinzhe
e38f78c4-0b8f-45ab-b59e-b3991d1f18d8
Dong, Junyu
f412ed20-b213-4c97-b0fd-f77262d6ab2f
Chen, Sheng
9310a111-f79a-48b8-98c7-383ca93cbb80
Yang, Ming-Hsuan
284704e5-cdfa-412a-a308-c739d6363989
He, Shengfeng
4abf3d21-503a-4e50-8b7c-a3eeabe60189

Du, Yong, Zhan, Jiahui, Li, Xinzhe, Dong, Junyu, Chen, Sheng, Yang, Ming-Hsuan and He, Shengfeng (2025) One-for-all: towards universal domain translation with a single StyleGAN. IEEE Transactions on Pattern Analysis and Machine Intelligence, 47 (4), 2865-2881. (doi:10.1109/TPAMI.2025.3530099).

Record type: Article

Abstract

In this paper, we propose a novel translation model, UniTranslator, for transforming representations between visually distinct domains under conditions of limited training data and significant visual differences. The main idea behind our approach is leveraging the domain-neutral capabilities of CLIP as a bridging mechanism, while utilizing a separate module to extract abstract, domain-agnostic semantics from the embeddings of both the source and target realms. Fusing these abstract semantics with target-specific semantics results in a transformed embedding within the CLIP space. To bridge the gap between the disparate worlds of CLIP and StyleGAN, we introduce a new non-linear mapper, the CLIP2P mapper. Utilizing CLIP embeddings, this module is tailored to approximate the latent distribution in the StyleGAN's latent space, effectively acting as a connector between these two spaces. The proposed UniTranslator is versatile and capable of performing various tasks, including style mixing, stylization, and translations, even in visually challenging scenarios across different visual domains. Notably, UniTranslator generates high-quality translations that showcase domain relevance, diversity, and improved image quality. UniTranslator surpasses the performance of existing general-purpose models and performs well against specialized models in representative tasks. The source code and trained models will be released to the public.

Text
TPAMI-main-final - Accepted Manuscript
Download (36MB)
Text
TPRMI2025-Apr - Version of Record
Download (8MB)

More information

Accepted/In Press date: 8 January 2025
e-pub ahead of print date: 21 January 2025
Published date: 6 March 2025
Additional Information: Publisher Copyright: © 1979-2012 IEEE.
Keywords: GAN Embedding, Generative Adversarial Networks, Image-to-Image Translation

Identifiers

Local EPrints ID: 498190
URI: http://eprints.soton.ac.uk/id/eprint/498190
ISSN: 1939-3539
PURE UUID: 510eff5d-fd4d-4fea-88c9-93cf5ec30a87

Catalogue record

Date deposited: 12 Feb 2025 17:37
Last modified: 21 Aug 2025 03:27

Export record

Altmetrics

Contributors

Author: Yong Du
Author: Jiahui Zhan
Author: Xinzhe Li
Author: Junyu Dong
Author: Sheng Chen
Author: Ming-Hsuan Yang
Author: Shengfeng He

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×