The University of Southampton
University of Southampton Institutional Repository

Data matters: Towards a data-centric theory of generalisation

Data matters: Towards a data-centric theory of generalisation
Data matters: Towards a data-centric theory of generalisation
The ability of a learning machine to perform outside the training data is referred to as its generalisation performance. Despite being researched for many years, generalisation is one of the key unresolved puzzles in machine learning. In this thesis we start building the understanding needed to construct a new framework for reasoning about generalisation. We start with a theoretical perspective but conclude that the field needs to build stronger intuitions before being able to formalise generalisation in a meaningful way. Our theoretical exploration, however, highlights that the data plays a much more central role than previously acknowledged. To better understand how the data can be incorporated in generalisation studies, we start exploring the practice of modifying images. The modifications we consider are mixed data augmentation, patch-shuffling, and patch-based occlusion. We find that there are a number of incorrect implicit assumptions in the literature regarding the side effects of data modification. These assumptions deem some distortion-based approaches to evaluating model attributes to be incorrect. In the case of modifying data to assess robustness to occlusion, we propose a solution that addresses the side effects. The existence of these incorrect assumptions attests to the fact that the field has a poor understanding of data modification. Despite the field’s limited understanding, data distortion has most recently been used to empirically predict generalisation performance. We focus on this practice and claim that data modification has been carelessly used in this case as well. We argue that it is the limited evaluation settings that caused the modification-based predictors to appear successful despite relying on poorly founded intuitions. We end by proposing the backbone for an extensive evaluation of empirical predictors of generalisation. We believe that such a practical approach to generalisation, when thoroughly designed, has the potential to provide the understanding needed to create a theoretical framework in future. Our proposed evaluation setting seeks to explore a variety of data-centric scenarios, highlighting the central role played by the data in the generalisation puzzle.
University of Southampton
Marcu, Antonia
5054fd8c-0a18-41a3-a140-1521d9a19573
Marcu, Antonia
5054fd8c-0a18-41a3-a140-1521d9a19573
Prugel-Bennett, Adam
b107a151-1751-4d8b-b8db-2c395ac4e14e

Marcu, Antonia (2022) Data matters: Towards a data-centric theory of generalisation. University of Southampton, Doctoral Thesis, 166pp.

Record type: Thesis (Doctoral)

Abstract

The ability of a learning machine to perform outside the training data is referred to as its generalisation performance. Despite being researched for many years, generalisation is one of the key unresolved puzzles in machine learning. In this thesis we start building the understanding needed to construct a new framework for reasoning about generalisation. We start with a theoretical perspective but conclude that the field needs to build stronger intuitions before being able to formalise generalisation in a meaningful way. Our theoretical exploration, however, highlights that the data plays a much more central role than previously acknowledged. To better understand how the data can be incorporated in generalisation studies, we start exploring the practice of modifying images. The modifications we consider are mixed data augmentation, patch-shuffling, and patch-based occlusion. We find that there are a number of incorrect implicit assumptions in the literature regarding the side effects of data modification. These assumptions deem some distortion-based approaches to evaluating model attributes to be incorrect. In the case of modifying data to assess robustness to occlusion, we propose a solution that addresses the side effects. The existence of these incorrect assumptions attests to the fact that the field has a poor understanding of data modification. Despite the field’s limited understanding, data distortion has most recently been used to empirically predict generalisation performance. We focus on this practice and claim that data modification has been carelessly used in this case as well. We argue that it is the limited evaluation settings that caused the modification-based predictors to appear successful despite relying on poorly founded intuitions. We end by proposing the backbone for an extensive evaluation of empirical predictors of generalisation. We believe that such a practical approach to generalisation, when thoroughly designed, has the potential to provide the understanding needed to create a theoretical framework in future. Our proposed evaluation setting seeks to explore a variety of data-centric scenarios, highlighting the central role played by the data in the generalisation puzzle.

Text
Thesis-a3b - Version of Record
Available under License University of Southampton Thesis Licence.
Download (2MB)
Text
Final-thesis-submission-Examination-Miss-Antonia-Marcu
Restricted to Repository staff only

More information

Published date: 2022

Identifiers

Local EPrints ID: 481319
URI: http://eprints.soton.ac.uk/id/eprint/481319
PURE UUID: 196bfb16-525f-44ec-84bc-041d1b60fe17

Catalogue record

Date deposited: 23 Aug 2023 16:48
Last modified: 16 Mar 2024 23:50

Export record

Contributors

Author: Antonia Marcu
Thesis advisor: Adam Prugel-Bennett

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×