Non-negative matrix factorisation: Algorithms and applications

Squires, Steven E. (2019) Non-negative matrix factorisation: Algorithms and applications. University of Southampton, Doctoral Thesis, 178pp.

Record type: Thesis (Doctoral)

Abstract

Non-negative matrix factorisation (NMF) is attractive in data analysis because it can produce a sparse and parts based representation of the data. In this thesis we investigate and demonstrate solutions to several aspects of NMF. In particular, we consider the oft overlooked issue of model selection utilising a principled approach using information theory, provide a method for including external data into the NMF formulation, implement an autoencoder framework that can perform variants of NMF, and extend this autoencoder approach using variational methods to produce a probabilistic form of NMF.

Two problems of model selection are explored in this thesis by taking a minimum description length (MDL) approach. First we produce and demonstrate a method using MDL to provide an estimate of the optimal subspace size to project onto. Secondly we extend this work by utilising MDL within the objective function for NMF to provide an automatic method of regularising the factorised matrices. The final representation should then be produced from a principled trade-off between accuracy and complexity reducing the problem of over-fitting.

In standard NMF we factorise our data into two matrices without consideration of any external sources of information. If we could include these exogenous drivers into the NMF formulation it might enable an improved factorisation to be formed using information not directly available to the internal data. Our solution to this problem, called XNMF, finds a combined representation which includes these external drivers by utilising an extended version of the popular multiplicative update method. We prove theoretically that our method is guaranteed to reduce the objective function monotonically and that it also does so empirically by testing on financial data. In addition, we demonstrate that XNMF produces an improved representation and that it may produce better clusters in the data than standard NMF. We also investigate the broader utility of XNMF by application to a problem in biology - spatial proteomics. We demonstrate that there are useful data analysis advantages to using XNMF and that there may be many other biological applications of this technique.

Our next contribution is to explore the use of autoencoders in producing the NMF factorisation (AE-NMF). We provide a set of methods that can produce the two factorised matrices and investigate advantages of using AE-NMF. In particular, we show that AENMF allows for the easy extension of NMF to perform a wider variety of tasks and with different objective functions.

Finally, we produce a method of combining AE-NMF with variational autoencoders to produce a probabilistic version of NMF. This method, unlike standard NMF, allows us to: generate new data; provides a probabilistic representation between the input and latent space; gives some sense of the uncertainty in our representation; and produces a representation that is regularised in a principled manner. Unlike some other forms of probabilistic NMF this approach does not require the use of sampling such as the Markov Chain Monte Carlo technique.

Text

SSquiresThesisFinal - Version of Record

Available under License University of Southampton Thesis Licence.

Download (6MB)