Learning factorised representation via generative models.

Zeng, Zezhen (2022) Learning factorised representation via generative models. University of Southampton, Doctoral Thesis, 121pp.

Record type: Thesis (Doctoral)

Abstract

Deep learning has been widely used in real-life applications during the last few decades, such as face recognition, machine translation, object detection and classification. Representation learning is an important part of deep learning, which can simply be understood as a method for dimension reduction. However, the representation learned by the task-specific model is hard to be applied to other tasks without parameter tuning, since it discards irrelevant information from the input. While for generative models, the model can learn a joint distribution over all variables and the latent space can almost maintain the whole information of the dataset rather than task-specific information. But the vanilla generative models can only learn an entangled representation which cannot be used efficiently. Thus, a factorised representation is needed in most cases. Focus more on images, this thesis proposes new methods to learn a factorised representation. This thesis starts by figuring out the quality of the representation learned by the backbone model Variational Autoencoder (VAE) visually. The proposed tool alleviates the blurriness of the vanilla VAE by introducing a discriminator. Then the potential of the VAE on transfer learning is explored. Collecting data is expensive, especially with labels. Transfer learning is one way to solve this issue. The results show a strong ability of the VAE on generalisation, which means the VAE can produce reasonable results even without parameter tuning. For factorised representation learning, this thesis follows a rule from a shallow level to a deep level. We propose a VAE-based model that can learn a latent space that factorises the foreground and the background of images, while the foreground in the experiments is defined as the objects inside the given bounding box labels. This factorised latent space allows the model to do conditional generation. The results can achieve a state-of-the-art Fréchet inception distance (FID) score. Then we investigate the unsupervised object-centric representation learning, which can be seen as a deeper level of the foreground representation. By observing that the object area tends to contain more information than the background in a multi-object scene, the model is designed to discover objects according to this difference. A better result can be obtained on the downstream task with the learned representation when compared to other related models.

Text

Zezhen Zeng, Doctoral thesis: Learning Factorised Representation Via Generative Models - Version of Record

Available under License University of Southampton Thesis Licence.

Download (12MB)

Text

PTD_Thesis_Zeng-SIGNED

Restricted to Repository staff only

Available under License University of Southampton Thesis Licence.