Towards an understanding of generalisation in deep learning: an analysis of the transformation of information in convolutional neural networks
Towards an understanding of generalisation in deep learning: an analysis of the transformation of information in convolutional neural networks
Despite their enormous size, Deep Neural Networks are able to achieve exceptional performance across a wide variety of machine learning problems, and have become a de facto standard in many areas of machine learning. The ability of such large models to reliably achieve good generalisation is difficult to reconcile with conventional theory on machine learning, which bounds the generalisation capability based on the size of the model, implying that more complex models should not — but importantly not that they cannot — reliably generalise.
In this work, I investigate generalisation within the specific domain of Convolutional Neural Networks (CNNs) applied to image classification problems. I investigate the way in which the layers of a CNN transform the data, and how this may entail the good generalisation performance these models exhibit. I investigate how margins between classes manifest and change, showing that the different operations in the network can increase or decrease the margin, as well as change the shape of the data in relation to the margin. I combine this with a replication and extension of the use of hidden layer probes to investigate how the classification problem changes through the network, showing that linear separability emerges through the networks, to an extent that almost matches the full classification performance of the network. I show how this linear separability aligns with some of the patterns seen in the class margins, and how the convolutions and activations work in tandem to both increase the margin and the linear separability. Finally, I extend the existing work on hidden layer probes to investigate globally pooled features within the model, showing that the information distilled by the network at each stage is primarily in coarse features, rather than at the pixel level.
University of Southampton
Belcher, Dominic
3ab2a3bc-8594-4eee-ae21-2df69a8d1721
2025
Belcher, Dominic
3ab2a3bc-8594-4eee-ae21-2df69a8d1721
Prugel-Bennett, Adam
b107a151-1751-4d8b-b8db-2c395ac4e14e
Dasmahapatra, Srinandan
eb5fd76f-4335-4ae9-a88a-20b9e2b3f698
Belcher, Dominic
(2025)
Towards an understanding of generalisation in deep learning: an analysis of the transformation of information in convolutional neural networks.
University of Southampton, Masters Thesis, 107pp.
Record type:
Thesis
(Masters)
Abstract
Despite their enormous size, Deep Neural Networks are able to achieve exceptional performance across a wide variety of machine learning problems, and have become a de facto standard in many areas of machine learning. The ability of such large models to reliably achieve good generalisation is difficult to reconcile with conventional theory on machine learning, which bounds the generalisation capability based on the size of the model, implying that more complex models should not — but importantly not that they cannot — reliably generalise.
In this work, I investigate generalisation within the specific domain of Convolutional Neural Networks (CNNs) applied to image classification problems. I investigate the way in which the layers of a CNN transform the data, and how this may entail the good generalisation performance these models exhibit. I investigate how margins between classes manifest and change, showing that the different operations in the network can increase or decrease the margin, as well as change the shape of the data in relation to the margin. I combine this with a replication and extension of the use of hidden layer probes to investigate how the classification problem changes through the network, showing that linear separability emerges through the networks, to an extent that almost matches the full classification performance of the network. I show how this linear separability aligns with some of the patterns seen in the class margins, and how the convolutions and activations work in tandem to both increase the margin and the linear separability. Finally, I extend the existing work on hidden layer probes to investigate globally pooled features within the model, showing that the information distilled by the network at each stage is primarily in coarse features, rather than at the pixel level.
Text
MPhil-4
- Version of Record
Text
Final-thesis-submission-Examination-Mr-Dominic-Belcher
Restricted to Repository staff only
More information
Published date: 2025
Identifiers
Local EPrints ID: 502042
URI: http://eprints.soton.ac.uk/id/eprint/502042
PURE UUID: 49cb30bb-846a-4ef4-9ea1-8c4a04aab537
Catalogue record
Date deposited: 13 Jun 2025 17:38
Last modified: 11 Sep 2025 01:59
Export record
Contributors
Author:
Dominic Belcher
Thesis advisor:
Adam Prugel-Bennett
Thesis advisor:
Srinandan Dasmahapatra
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics