The University of Southampton
University of Southampton Institutional Repository

Clustering-based validation splits for model selection under domain shift

Clustering-based validation splits for model selection under domain shift
Clustering-based validation splits for model selection under domain shift
This paper considers the problem of model selection under domain shift. Motivated by principles from distributionally robust optimisation and domain adaptation theory, it is proposed that the training-validation split should maximise the distribution mismatch between the two sets. By adopting the maximum mean discrepancy (MMD) as the measure of mismatch, it is shown that the partitioning problem reduces to kernel k-means clustering. A constrained clustering algorithm, which leverages linear programming to control the size, label, and (optionally) group distributions of the splits, is presented. The algorithm does not require additional metadata, and comes with convergence guarantees. In experiments, the technique consistently outperforms alternative splitting strategies across a range of datasets and training algorithms, for both domain generalisation and unsupervised domain adaptation tasks. Analysis also shows the MMD between the training and validation sets to be well-correlated with test domain accuracy, further substantiating the validity of this approach.
Napoli, Andrea
a33a079f-43e5-4b85-a61d-aa3d26c2f590
White, Paul
2dd2477b-5aa9-42e2-9d19-0806d994eaba
Napoli, Andrea
a33a079f-43e5-4b85-a61d-aa3d26c2f590
White, Paul
2dd2477b-5aa9-42e2-9d19-0806d994eaba

Napoli, Andrea and White, Paul (2025) Clustering-based validation splits for model selection under domain shift. Transactions on Machine Learning Research.

Record type: Article

Abstract

This paper considers the problem of model selection under domain shift. Motivated by principles from distributionally robust optimisation and domain adaptation theory, it is proposed that the training-validation split should maximise the distribution mismatch between the two sets. By adopting the maximum mean discrepancy (MMD) as the measure of mismatch, it is shown that the partitioning problem reduces to kernel k-means clustering. A constrained clustering algorithm, which leverages linear programming to control the size, label, and (optionally) group distributions of the splits, is presented. The algorithm does not require additional metadata, and comes with convergence guarantees. In experiments, the technique consistently outperforms alternative splitting strategies across a range of datasets and training algorithms, for both domain generalisation and unsupervised domain adaptation tasks. Analysis also shows the MMD between the training and validation sets to be well-correlated with test domain accuracy, further substantiating the validity of this approach.

Text
Clustering-Based Validation Splits for Model Selection under Domain Shift - Version of Record
Available under License Creative Commons Attribution.
Download (567kB)

More information

Accepted/In Press date: 1 August 2025
Published date: 17 August 2025

Identifiers

Local EPrints ID: 505057
URI: http://eprints.soton.ac.uk/id/eprint/505057
PURE UUID: a6610fd5-30d9-4502-822a-98bd3630f3e7
ORCID for Paul White: ORCID iD orcid.org/0000-0002-4787-8713

Catalogue record

Date deposited: 25 Sep 2025 16:52
Last modified: 26 Sep 2025 01:33

Export record

Contributors

Author: Andrea Napoli
Author: Paul White ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×