The University of Southampton
University of Southampton Institutional Repository

The importance of error measures for machine learning regression to approximate the ground truth

The importance of error measures for machine learning regression to approximate the ground truth
The importance of error measures for machine learning regression to approximate the ground truth
As machine learning technology improves, it is increasingly relied upon when making significant decisions which require a high level of trust. Accuracy and interpretability is paramount for trust in regression methods, which comprise a large portion of the field. To apply these methods with confidence there needs to be a certainty that they have modelled the ground truth of a dataset— the correct input-output relationships. Conventional regression error measures, however, do not ensure that the correct relationships are modelled, as they only require accurate point predictions to assign low error to a method. A case study of power prediction for merchant vessels is used to illustrate the problem, where accurate prediction and correct input-output relationship modelling is required, although there is limited understanding of these input-output relationships. For this problem neural networks can produce predictions with a 2% Mean Absolute Relative Error, which is low enough for use in fuel saving devices on-board vessels in operation. The methods developed in this thesis have been deployed on over a dozen merchant vessels operated by Shell Shipping and Maritime, saving over 1/4 million tonnes of CO2 emissions in 2020. However, the predictions are not interpretable, as the input-output relationships modelled are not consistent or correct. A new error measure, the Mean Fit to Median Error, is investigated which ensures networks approximate the conditional averages and is applicable to any dataset. This is verified on 36 artificial datasets, where the ground truth is known, and is shown to correlate to the ground truth on average 60% higher than traditional error measures correlate to the ground truth. The Mean Fit to Median Error is then applied to the ship powering example and shows a shift in the approximated relationships for the same Mean Absolute Relative Error values, showing an improvement in determining the ground truth. Networks reporting low Mean Fit to Median errors model more consistent and correct input-output relationships and are robust to areas of sparse data.
University of Southampton
Parkes, Amy
9fbc0481-7bcf-4d15-8474-4df77d4338ef
Parkes, Amy
9fbc0481-7bcf-4d15-8474-4df77d4338ef
Hudson, Dominic
3814e08b-1993-4e78-b5a4-2598c40af8e7

Parkes, Amy (2021) The importance of error measures for machine learning regression to approximate the ground truth. University of Southampton, Doctoral Thesis, 141pp.

Record type: Thesis (Doctoral)

Abstract

As machine learning technology improves, it is increasingly relied upon when making significant decisions which require a high level of trust. Accuracy and interpretability is paramount for trust in regression methods, which comprise a large portion of the field. To apply these methods with confidence there needs to be a certainty that they have modelled the ground truth of a dataset— the correct input-output relationships. Conventional regression error measures, however, do not ensure that the correct relationships are modelled, as they only require accurate point predictions to assign low error to a method. A case study of power prediction for merchant vessels is used to illustrate the problem, where accurate prediction and correct input-output relationship modelling is required, although there is limited understanding of these input-output relationships. For this problem neural networks can produce predictions with a 2% Mean Absolute Relative Error, which is low enough for use in fuel saving devices on-board vessels in operation. The methods developed in this thesis have been deployed on over a dozen merchant vessels operated by Shell Shipping and Maritime, saving over 1/4 million tonnes of CO2 emissions in 2020. However, the predictions are not interpretable, as the input-output relationships modelled are not consistent or correct. A new error measure, the Mean Fit to Median Error, is investigated which ensures networks approximate the conditional averages and is applicable to any dataset. This is verified on 36 artificial datasets, where the ground truth is known, and is shown to correlate to the ground truth on average 60% higher than traditional error measures correlate to the ground truth. The Mean Fit to Median Error is then applied to the ship powering example and shows a shift in the approximated relationships for the same Mean Absolute Relative Error values, showing an improvement in determining the ground truth. Networks reporting low Mean Fit to Median errors model more consistent and correct input-output relationships and are robust to areas of sparse data.

Text
FINAL THESIS_with_copyright_anonymised - Version of Record
Available under License University of Southampton Thesis Licence.
Download (20MB)
Text
Permission to deposit thesis
Restricted to Repository staff only
Available under License University of Southampton Thesis Licence.

More information

Submitted date: 5 August 2021

Identifiers

Local EPrints ID: 456192
URI: http://eprints.soton.ac.uk/id/eprint/456192
PURE UUID: 83f1fe0a-cfb6-4029-8d7b-d2281cbdc6d8
ORCID for Dominic Hudson: ORCID iD orcid.org/0000-0002-2012-6255

Catalogue record

Date deposited: 26 Apr 2022 16:41
Last modified: 17 Mar 2024 02:41

Export record

Contributors

Author: Amy Parkes
Thesis advisor: Dominic Hudson ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×