The University of Southampton
University of Southampton Institutional Repository

Advancing differentiable program optimisation via novel first and second-order metrics, and adaptive optimisation strategies

Advancing differentiable program optimisation via novel first and second-order metrics, and adaptive optimisation strategies
Advancing differentiable program optimisation via novel first and second-order metrics, and adaptive optimisation strategies
This thesis is on the optimisation of deep neural networks, how they behave during training, how they can be made more efficient, and how methods and ideas from classical optimisation vary when applied in a deep learning context.

The first part of the thesis is a first-order analysis of the training process of deep neural networks, with a focus on the long range structure of the learning process and how well this can be approximated by a stochastic process.
We show that the learning process of deep neural networks can be reasonably well approximated by a stochastic process, and that this diffusion process can be used to understand the differences between the most popular optimisers including the artefacts from the update equations and the different convergence rates.
The second part of the thesis is on the second-order analysis of the training process of deep neural networks, with a focus on the curvature of the loss surface and how this can be used to improve the optimisation process.
In the third part we propose a new optimisation algorithm, called orthogonalised stochastic gradient descent (OSGD), which is based on the effect of introducing a diversification bias on the convolutional filters via orthonormalisation.

And also show that the adaption from SGD to OSGD can be used to improve the convergence rate of other optimisers, including Adam and RMSProp.

We show that this algorithm can be used to train deep neural networks with fewer epochs and better generalisation performance.

This work concludes with an overview of the results, a discussion of the implications of the work presented in this thesis, and some promising future directions of study.
University of Southampton
Tuddenham, Mark
696df9f2-7a63-401e-9230-477c692f8782
Tuddenham, Mark
696df9f2-7a63-401e-9230-477c692f8782
Prugel-Bennett, Adam
b107a151-1751-4d8b-b8db-2c395ac4e14e
Hare, Jonathon
65ba2cda-eaaf-4767-a325-cd845504e5a9

Tuddenham, Mark (2026) Advancing differentiable program optimisation via novel first and second-order metrics, and adaptive optimisation strategies. University of Southampton, Doctoral Thesis, 204pp.

Record type: Thesis (Doctoral)

Abstract

This thesis is on the optimisation of deep neural networks, how they behave during training, how they can be made more efficient, and how methods and ideas from classical optimisation vary when applied in a deep learning context.

The first part of the thesis is a first-order analysis of the training process of deep neural networks, with a focus on the long range structure of the learning process and how well this can be approximated by a stochastic process.
We show that the learning process of deep neural networks can be reasonably well approximated by a stochastic process, and that this diffusion process can be used to understand the differences between the most popular optimisers including the artefacts from the update equations and the different convergence rates.
The second part of the thesis is on the second-order analysis of the training process of deep neural networks, with a focus on the curvature of the loss surface and how this can be used to improve the optimisation process.
In the third part we propose a new optimisation algorithm, called orthogonalised stochastic gradient descent (OSGD), which is based on the effect of introducing a diversification bias on the convolutional filters via orthonormalisation.

And also show that the adaption from SGD to OSGD can be used to improve the convergence rate of other optimisers, including Adam and RMSProp.

We show that this algorithm can be used to train deep neural networks with fewer epochs and better generalisation performance.

This work concludes with an overview of the results, a discussion of the implications of the work presented in this thesis, and some promising future directions of study.

Text
thesis - Version of Record
Available under License University of Southampton Thesis Licence.
Download (16MB)
Text
Final-thesis-submission-Examination-Mr-Mark-Tuddenham
Restricted to Repository staff only

More information

Published date: 2026

Identifiers

Local EPrints ID: 511451
URI: http://eprints.soton.ac.uk/id/eprint/511451
PURE UUID: 162ed04c-f75d-4527-b337-46c81c3746c9
ORCID for Mark Tuddenham: ORCID iD orcid.org/0000-0002-3428-4051
ORCID for Jonathon Hare: ORCID iD orcid.org/0000-0003-2921-4283

Catalogue record

Date deposited: 14 May 2026 16:45
Last modified: 15 May 2026 01:59

Export record

Contributors

Author: Mark Tuddenham ORCID iD
Thesis advisor: Adam Prugel-Bennett
Thesis advisor: Jonathon Hare ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×