Efficient strategies for engineering data re-use

One of the central challenges of machine learning is overfitting, whereby algorithms model noise or random fluctuations instead of the true underlying patterns. In engineering design, factors such as noise, uncertainty and complex, nonlinear responses exacerbate the problem. However, the predominant concern remains the prohibitive cost of generating sufficient training data.

This thesis demonstrates the merits of data reuse in addressing the issue of training data sparsity. Leveraging historical data leads to an increase in the effective size and quality of the training set, improving predictive accuracy and enabling a more thorough exploration of the design space. Transfer optimisation, a recently introduced paradigm for engineering data reuse, serves as a mathematical basis, and several surrogate models are demonstrated as its practical realisations. The thesis makes two contributions regarding the effective integration and reuse of historical data.

The first contribution concerns cases where the response values of the historical and new data sources are uncorrelated - a condition shown to be prevalent in engineering design, as even the most subtle topological modifications between data sources can produce pronounced changes in the objective function topography. A novel calibration method is introduced. Results demonstrate that the technique can halve optimisation cost when the relationship between data sources is linear. The second contribution deals with situations where the parameterisations of historical and new data sources differ, with no overlap or known mapping between them. The novel method uses simulation physics, rather than a mapping function, to represent the distribution of source and target samples. Then, it employs a t-SNE-inspired optimisation routine to recreate this distribution in a specified design variable space. Multiple-output Gaussian processes are used to model the resulting distribution of target and source samples. When used as part of the efficient global optimisation algorithm, the proposed approach yields significant reductions in cost - between 30 and 60% - relative to a Kriging-based method with no data reuse.

t-SNE, Transfer optimisation, Kriging, Surrogate modelling, machine learning (artificial intelligence)

University of Southampton

Cimpoesu, Petru-Cristian

7c428386-b15e-4aa3-a5a5-14dd1317b06d

2025

Cimpoesu, Petru-Cristian

7c428386-b15e-4aa3-a5a5-14dd1317b06d

Toal, David

dc67543d-69d2-4f27-a469-42195fa31a68

Keane, Andy

26d7fa33-5415-4910-89d8-fb3620413def

Wang, Leran

91d2f4ca-ed47-4e47-adff-70fef3874564

Cimpoesu, Petru-Cristian (2025) Efficient strategies for engineering data re-use. University of Southampton, Doctoral Thesis, 131pp.

Record type: Thesis (Doctoral)

Abstract

Text

data_reuse_phd_thesis - Accepted Manuscript

Restricted to Repository staff only until 17 September 2028.

Text

Final-thesis-submission-Examination-Mr-Petru-Cristian-Cimpoesu

Restricted to Repository staff only

More information

Published date: 2025

Keywords: t-SNE, Transfer optimisation, Kriging, Surrogate modelling, machine learning (artificial intelligence)

Identifiers

Local EPrints ID: 504782

URI: http://eprints.soton.ac.uk/id/eprint/504782

PURE UUID: 3762229e-df84-4218-8d71-22107d826cf6

ORCID for David Toal:

orcid.org/0000-0002-2203-0302

ORCID for Andy Keane:

orcid.org/0000-0001-7993-1569

Catalogue record

Date deposited: 18 Sep 2025 17:06

Last modified: 08 Jan 2026 02:44

Export record

Share this record

Share this on Facebook Share this on Twitter Share this on Weibo

Contributors

Author: Petru-Cristian Cimpoesu

Thesis advisor: David Toal

Thesis advisor: Andy Keane

Thesis advisor: Leran Wang

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Library staff additional information