GTV: generating tabular data via vertical federated learning

Synthetic data has emerged as a promising avenue for privacy-preserving data sharing. However, constructing synthetic data generators necessitates access to the real dataset, posing challenges, particularly when data features are disparately distributed across different organizations.
Vertical Federated Learning (VFL) is a collaborative approach to training machine learning models among distinct tabular data holders, such as financial institutions, who possess disjoint features for the same group of customers. In this paper, we introduce the GTV framework for Generating Tabular Data via Vertical Federated Learning and demonstrate that VFL can be successfully used to implement GANs for distributed tabular data in a privacy-preserving manner, with performance close to centralized GANs which assume shared data. We make design choices with respect to the distribution of GAN generator and discriminator models, and we introduce a training-with-shuffling technique so that no party can reconstruct training data from the GAN conditional vector. The paper presents (1) an implementation of GTV, (2) a detailed quality evaluation of the GTV-generated synthetic data,
(3) an examination of GTV framework on different data distribution and number of clients, and
(4) an analysis on GTV's robustness against Membership Inference Attacks with different settings of Differential Privacy,
for a range of datasets with diverse distribution characteristics. Our results demonstrate that GTV can consistently generate high-fidelity synthetic tabular data of comparable quality to that generated by a centralized GAN algorithm. The difference in machine learning utility can be as low as 2.7%, even under extremely imbalanced data distributions across clients. Code is available at: https://github.com/zhao-zilong/gtv

Zhao, Zilong

ac186929-4179-4cb5-9a00-48ccea814626

Wu, Han

df26f7c9-c15d-4c37-baa3-68bc19e1d74b

van Moorsel, Aad

7a10ae28-b1df-4cb7-8200-f0654ae616a5

Chen, Lydia Y.

4509d882-37b6-4094-a88e-b05a107b5db7

23 June 2025

Zhao, Zilong

ac186929-4179-4cb5-9a00-48ccea814626

Wu, Han

df26f7c9-c15d-4c37-baa3-68bc19e1d74b

van Moorsel, Aad

7a10ae28-b1df-4cb7-8200-f0654ae616a5

Chen, Lydia Y.

4509d882-37b6-4094-a88e-b05a107b5db7

Zhao, Zilong, Wu, Han, van Moorsel, Aad and Chen, Lydia Y. (2025) GTV: generating tabular data via vertical federated learning. In The 55th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2025). 14 pp .

Record type: Conference or Workshop Item (Paper)

Abstract

Text

GTV___DSN2025 - Accepted Manuscript

Available under License Creative Commons Attribution.

Download (6MB)

More information

Published date: 23 June 2025

Venue - Dates: The 55th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, , Naples, Italy, 2025-06-23 - 2025-06-26

Related URLs:

https://dsn2025.github.io/cpac...epted.html

Learn more about Cyber Security research

Identifiers

Local EPrints ID: 500979

URI: http://eprints.soton.ac.uk/id/eprint/500979

PURE UUID: c1e9d002-46a1-4de9-8fbf-40b103e38843

Catalogue record

Date deposited: 20 May 2025 16:40

Last modified: 20 May 2025 16:41

Export record

Share this record

Share this on Facebook Share this on Twitter Share this on Weibo

Contributors

Author: Zilong Zhao

Author: Han Wu

Author: Aad van Moorsel

Author: Lydia Y. Chen

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Library staff additional information