The University of Southampton
University of Southampton Institutional Repository

Non-negative matrix factorisation: Algorithms and applications

Non-negative matrix factorisation: Algorithms and applications
Non-negative matrix factorisation: Algorithms and applications
Non-negative matrix factorisation (NMF) is attractive in data analysis because it can produce a sparse and parts based representation of the data. In this thesis we investigate and demonstrate solutions to several aspects of NMF. In particular, we consider the oft overlooked issue of model selection utilising a principled approach using information theory, provide a method for including external data into the NMF formulation, implement an autoencoder framework that can perform variants of NMF, and extend this autoencoder approach using variational methods to produce a probabilistic form of NMF.

Two problems of model selection are explored in this thesis by taking a minimum description length (MDL) approach. First we produce and demonstrate a method using MDL to provide an estimate of the optimal subspace size to project onto. Secondly we extend this work by utilising MDL within the objective function for NMF to provide an automatic method of regularising the factorised matrices. The final representation should then be produced from a principled trade-off between accuracy and complexity reducing the problem of over-fitting.

In standard NMF we factorise our data into two matrices without consideration of any external sources of information. If we could include these exogenous drivers into the NMF formulation it might enable an improved factorisation to be formed using information not directly available to the internal data. Our solution to this problem, called XNMF, finds a combined representation which includes these external drivers by utilising an extended version of the popular multiplicative update method. We prove theoretically that our method is guaranteed to reduce the objective function monotonically and that it also does so empirically by testing on financial data. In addition, we demonstrate that XNMF produces an improved representation and that it may produce better clusters in the data than standard NMF. We also investigate the broader utility of XNMF by application to a problem in biology - spatial proteomics. We demonstrate that there are useful data analysis advantages to using XNMF and that there may be many other biological applications of this technique.

Our next contribution is to explore the use of autoencoders in producing the NMF factorisation (AE-NMF). We provide a set of methods that can produce the two factorised matrices and investigate advantages of using AE-NMF. In particular, we show that AENMF allows for the easy extension of NMF to perform a wider variety of tasks and with different objective functions.

Finally, we produce a method of combining AE-NMF with variational autoencoders to produce a probabilistic version of NMF. This method, unlike standard NMF, allows us to: generate new data; provides a probabilistic representation between the input and latent space; gives some sense of the uncertainty in our representation; and produces a representation that is regularised in a principled manner. Unlike some other forms of probabilistic NMF this approach does not require the use of sampling such as the Markov Chain Monte Carlo technique.
University of Southampton
Squires, Steven E.
68512c11-065d-45e7-a0a9-54a32198e6b3
Squires, Steven E.
68512c11-065d-45e7-a0a9-54a32198e6b3
Niranjan, Mahesan
5cbaeea8-7288-4b55-a89c-c43d212ddd4f

Squires, Steven E. (2019) Non-negative matrix factorisation: Algorithms and applications. University of Southampton, Doctoral Thesis, 178pp.

Record type: Thesis (Doctoral)

Abstract

Non-negative matrix factorisation (NMF) is attractive in data analysis because it can produce a sparse and parts based representation of the data. In this thesis we investigate and demonstrate solutions to several aspects of NMF. In particular, we consider the oft overlooked issue of model selection utilising a principled approach using information theory, provide a method for including external data into the NMF formulation, implement an autoencoder framework that can perform variants of NMF, and extend this autoencoder approach using variational methods to produce a probabilistic form of NMF.

Two problems of model selection are explored in this thesis by taking a minimum description length (MDL) approach. First we produce and demonstrate a method using MDL to provide an estimate of the optimal subspace size to project onto. Secondly we extend this work by utilising MDL within the objective function for NMF to provide an automatic method of regularising the factorised matrices. The final representation should then be produced from a principled trade-off between accuracy and complexity reducing the problem of over-fitting.

In standard NMF we factorise our data into two matrices without consideration of any external sources of information. If we could include these exogenous drivers into the NMF formulation it might enable an improved factorisation to be formed using information not directly available to the internal data. Our solution to this problem, called XNMF, finds a combined representation which includes these external drivers by utilising an extended version of the popular multiplicative update method. We prove theoretically that our method is guaranteed to reduce the objective function monotonically and that it also does so empirically by testing on financial data. In addition, we demonstrate that XNMF produces an improved representation and that it may produce better clusters in the data than standard NMF. We also investigate the broader utility of XNMF by application to a problem in biology - spatial proteomics. We demonstrate that there are useful data analysis advantages to using XNMF and that there may be many other biological applications of this technique.

Our next contribution is to explore the use of autoencoders in producing the NMF factorisation (AE-NMF). We provide a set of methods that can produce the two factorised matrices and investigate advantages of using AE-NMF. In particular, we show that AENMF allows for the easy extension of NMF to perform a wider variety of tasks and with different objective functions.

Finally, we produce a method of combining AE-NMF with variational autoencoders to produce a probabilistic version of NMF. This method, unlike standard NMF, allows us to: generate new data; provides a probabilistic representation between the input and latent space; gives some sense of the uncertainty in our representation; and produces a representation that is regularised in a principled manner. Unlike some other forms of probabilistic NMF this approach does not require the use of sampling such as the Markov Chain Monte Carlo technique.

Text
SSquiresThesisFinal - Version of Record
Available under License University of Southampton Thesis Licence.
Download (6MB)

More information

Published date: March 2019

Identifiers

Local EPrints ID: 433528
URI: http://eprints.soton.ac.uk/id/eprint/433528
PURE UUID: 0fce621b-89ba-4b18-8eef-92e05e29b639
ORCID for Steven E. Squires: ORCID iD orcid.org/0000-0003-3681-1342

Catalogue record

Date deposited: 27 Aug 2019 16:30
Last modified: 07 Sep 2019 00:27

Export record

Contributors

Author: Steven E. Squires ORCID iD
Thesis advisor: Mahesan Niranjan

University divisions

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×