Efficient strategies for engineering data re-use
Efficient strategies for engineering data re-use
One of the central challenges of machine learning is overfitting, whereby algorithms model noise or random fluctuations instead of the true underlying patterns. In engineering design, factors such as noise, uncertainty and complex, nonlinear responses exacerbate the problem. However, the predominant concern remains the prohibitive cost of generating sufficient training data.
This thesis demonstrates the merits of data reuse in addressing the issue of training data sparsity. Leveraging historical data leads to an increase in the effective size and quality of the training set, improving predictive accuracy and enabling a more thorough exploration of the design space. Transfer optimisation, a recently introduced paradigm for engineering data reuse, serves as a mathematical basis, and several surrogate models are demonstrated as its practical realisations. The thesis makes two contributions regarding the effective integration and reuse of historical data.
The first contribution concerns cases where the response values of the historical and new data sources are uncorrelated - a condition shown to be prevalent in engineering design, as even the most subtle topological modifications between data sources can produce pronounced changes in the objective function topography. A novel calibration method is introduced. Results demonstrate that the technique can halve optimisation cost when the relationship between data sources is linear. The second contribution deals with situations where the parameterisations of historical and new data sources differ, with no overlap or known mapping between them. The novel method uses simulation physics, rather than a mapping function, to represent the distribution of source and target samples. Then, it employs a t-SNE-inspired optimisation routine to recreate this distribution in a specified design variable space. Multiple-output Gaussian processes are used to model the resulting distribution of target and source samples. When used as part of the efficient global optimisation algorithm, the proposed approach yields significant reductions in cost - between 30 and 60% - relative to a Kriging-based method with no data reuse.
t-SNE, Transfer optimisation, Kriging, Surrogate modelling, machine learning (artificial intelligence)
University of Southampton
Cimpoesu, Petru-Cristian
7c428386-b15e-4aa3-a5a5-14dd1317b06d
2025
Cimpoesu, Petru-Cristian
7c428386-b15e-4aa3-a5a5-14dd1317b06d
Toal, David
dc67543d-69d2-4f27-a469-42195fa31a68
Keane, Andy
26d7fa33-5415-4910-89d8-fb3620413def
Wang, Leran
91d2f4ca-ed47-4e47-adff-70fef3874564
Cimpoesu, Petru-Cristian
(2025)
Efficient strategies for engineering data re-use.
University of Southampton, Doctoral Thesis, 131pp.
Record type:
Thesis
(Doctoral)
Abstract
One of the central challenges of machine learning is overfitting, whereby algorithms model noise or random fluctuations instead of the true underlying patterns. In engineering design, factors such as noise, uncertainty and complex, nonlinear responses exacerbate the problem. However, the predominant concern remains the prohibitive cost of generating sufficient training data.
This thesis demonstrates the merits of data reuse in addressing the issue of training data sparsity. Leveraging historical data leads to an increase in the effective size and quality of the training set, improving predictive accuracy and enabling a more thorough exploration of the design space. Transfer optimisation, a recently introduced paradigm for engineering data reuse, serves as a mathematical basis, and several surrogate models are demonstrated as its practical realisations. The thesis makes two contributions regarding the effective integration and reuse of historical data.
The first contribution concerns cases where the response values of the historical and new data sources are uncorrelated - a condition shown to be prevalent in engineering design, as even the most subtle topological modifications between data sources can produce pronounced changes in the objective function topography. A novel calibration method is introduced. Results demonstrate that the technique can halve optimisation cost when the relationship between data sources is linear. The second contribution deals with situations where the parameterisations of historical and new data sources differ, with no overlap or known mapping between them. The novel method uses simulation physics, rather than a mapping function, to represent the distribution of source and target samples. Then, it employs a t-SNE-inspired optimisation routine to recreate this distribution in a specified design variable space. Multiple-output Gaussian processes are used to model the resulting distribution of target and source samples. When used as part of the efficient global optimisation algorithm, the proposed approach yields significant reductions in cost - between 30 and 60% - relative to a Kriging-based method with no data reuse.
Text
data_reuse_phd_thesis
- Accepted Manuscript
Restricted to Repository staff only until 17 September 2028.
Text
Final-thesis-submission-Examination-Mr-Petru-Cristian-Cimpoesu
Restricted to Repository staff only
More information
Published date: 2025
Keywords:
t-SNE, Transfer optimisation, Kriging, Surrogate modelling, machine learning (artificial intelligence)
Identifiers
Local EPrints ID: 504782
URI: http://eprints.soton.ac.uk/id/eprint/504782
PURE UUID: 3762229e-df84-4218-8d71-22107d826cf6
Catalogue record
Date deposited: 18 Sep 2025 17:06
Last modified: 19 Sep 2025 01:43
Export record
Contributors
Author:
Petru-Cristian Cimpoesu
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics