Interpretable modelling with sparse kernels

Kandola, J.S. (2001) Interpretable modelling with sparse kernels. University of Southampton, Electronics and Computer Science : University of Southampton, Doctoral Thesis, 177pp.

Record type: Thesis (Doctoral)

Abstract

A drawback of many statistical modelling techniques, commonly used in machine learning, is that the resulting model is difficult to interpret. The principal focus of this thesis is the development of advanced non-linear interpretable models. Interpretable modelling offers us a powerful tool with which to understand the structure of a model constructed from data, allowing model validation and assisting in model selection. Gibbs (1997) observes the easiest way to introduce model interpretability is to use models where the parameters and the related hyperparameters have clearly interpretable meanings. The Bayesian methodology of Automatic Relevance Determination (ARD) (MacKay, 1994; Neal, 1995) is one such approach. In this thesis Laplace approximations, variational learning and Markov Chain Monte Carlo (MCMC) methods for hyperparameter determination are assessed within a Bayesian neural network. Empirical results highlight the numerical instability of the Laplace and variational methods with convergence to a local rather than global minima. Kernel methods have become a popular modelling approach (Vapnik, 1998; Smola, 1998; Williams, 1998). In this thesis the constructed kernel models are equipped with hyperparameters that allow: the ability to select important input variables, the ability to visualise the model structure and the ability to incorporate prior or expert knowledge. Ideas from the Bayesian and the signal processing communities together with the representational advantage of a sparse ANOVA decomposition have been merged. Interpretability is introduced by using two forms of regularisation: a 1-norm based structural regulariser to enforce interpretability, and a 2-norm based regulariser to control smoothness. The model structure can be visualised showing the overall effects of different inputs, their interactions, and the strength of the interactions. The performance of these interpretable learning algorithms is demonstrated on both synthetic and “real” data, notably the AMPG dataset, the Boston house price dataset, and the problem of predicting the mechanical property proof stress of a metal based on its chemical composition. Results from these different approaches are compared in terms of their interpretabilty by exploiting prior knowledge of the problem, and show the potential of interpretable data models.

Text

Thesis.pdf - Version of Record

Available under License University of Southampton Thesis Licence.

Download (3MB)