The University of Southampton
University of Southampton Institutional Repository

Learning factorised representation via generative models.

Learning factorised representation via generative models.
Learning factorised representation via generative models.
Deep learning has been widely used in real-life applications during the last few decades, such as face recognition, machine translation, object detection and classification. Representation learning is an important part of deep learning, which can simply be understood as a method for dimension reduction. However, the representation learned by the task-specific model is hard to be applied to other tasks without parameter tuning, since it discards irrelevant information from the input. While for generative models, the model can learn a joint distribution over all variables and the latent space can almost maintain the whole information of the dataset rather than task-specific information. But the vanilla generative models can only learn an entangled representation which cannot be used efficiently. Thus, a factorised representation is needed in most cases. Focus more on images, this thesis proposes new methods to learn a factorised representation. This thesis starts by figuring out the quality of the representation learned by the backbone model Variational Autoencoder (VAE) visually. The proposed tool alleviates the blurriness of the vanilla VAE by introducing a discriminator. Then the potential of the VAE on transfer learning is explored. Collecting data is expensive, especially with labels. Transfer learning is one way to solve this issue. The results show a strong ability of the VAE on generalisation, which means the VAE can produce reasonable results even without parameter tuning. For factorised representation learning, this thesis follows a rule from a shallow level to a deep level. We propose a VAE-based model that can learn a latent space that factorises the foreground and the background of images, while the foreground in the experiments is defined as the objects inside the given bounding box labels. This factorised latent space allows the model to do conditional generation. The results can achieve a state-of-the-art Fréchet inception distance (FID) score. Then we investigate the unsupervised object-centric representation learning, which can be seen as a deeper level of the foreground representation. By observing that the object area tends to contain more information than the background in a multi-object scene, the model is designed to discover objects according to this difference. A better result can be obtained on the downstream task with the learned representation when compared to other related models.
University of Southampton
Zeng, Zezhen
d340d998-568a-434f-95eb-ef39ee335912
Zeng, Zezhen
d340d998-568a-434f-95eb-ef39ee335912
Prugel-Bennett, Adam
b107a151-1751-4d8b-b8db-2c395ac4e14e

Zeng, Zezhen (2022) Learning factorised representation via generative models. University of Southampton, Doctoral Thesis, 121pp.

Record type: Thesis (Doctoral)

Abstract

Deep learning has been widely used in real-life applications during the last few decades, such as face recognition, machine translation, object detection and classification. Representation learning is an important part of deep learning, which can simply be understood as a method for dimension reduction. However, the representation learned by the task-specific model is hard to be applied to other tasks without parameter tuning, since it discards irrelevant information from the input. While for generative models, the model can learn a joint distribution over all variables and the latent space can almost maintain the whole information of the dataset rather than task-specific information. But the vanilla generative models can only learn an entangled representation which cannot be used efficiently. Thus, a factorised representation is needed in most cases. Focus more on images, this thesis proposes new methods to learn a factorised representation. This thesis starts by figuring out the quality of the representation learned by the backbone model Variational Autoencoder (VAE) visually. The proposed tool alleviates the blurriness of the vanilla VAE by introducing a discriminator. Then the potential of the VAE on transfer learning is explored. Collecting data is expensive, especially with labels. Transfer learning is one way to solve this issue. The results show a strong ability of the VAE on generalisation, which means the VAE can produce reasonable results even without parameter tuning. For factorised representation learning, this thesis follows a rule from a shallow level to a deep level. We propose a VAE-based model that can learn a latent space that factorises the foreground and the background of images, while the foreground in the experiments is defined as the objects inside the given bounding box labels. This factorised latent space allows the model to do conditional generation. The results can achieve a state-of-the-art Fréchet inception distance (FID) score. Then we investigate the unsupervised object-centric representation learning, which can be seen as a deeper level of the foreground representation. By observing that the object area tends to contain more information than the background in a multi-object scene, the model is designed to discover objects according to this difference. A better result can be obtained on the downstream task with the learned representation when compared to other related models.

Text
Zezhen Zeng, Doctoral thesis: Learning Factorised Representation Via Generative Models - Version of Record
Available under License University of Southampton Thesis Licence.
Download (12MB)
Text
PTD_Thesis_Zeng-SIGNED
Restricted to Repository staff only
Available under License University of Southampton Thesis Licence.

More information

Published date: 1 August 2022

Identifiers

Local EPrints ID: 472890
URI: http://eprints.soton.ac.uk/id/eprint/472890
PURE UUID: 846141fc-826c-458c-9338-a3a12f735aab

Catalogue record

Date deposited: 05 Jan 2023 17:40
Last modified: 17 Mar 2024 00:01

Export record

Contributors

Author: Zezhen Zeng
Thesis advisor: Adam Prugel-Bennett

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×