Interpretable modelling with sparse kernels
Interpretable modelling with sparse kernels
A drawback of many statistical modelling techniques, commonly used in machine learning, is that the resulting model is difficult to interpret. The principal focus of this thesis is the development of advanced non-linear interpretable models. Interpretable modelling offers us a powerful tool with which to understand the structure of a model constructed from data, allowing model validation and assisting in model selection. Gibbs (1997) observes the easiest way to introduce model interpretability is to use models where the parameters and the related hyperparameters have clearly interpretable meanings. The Bayesian methodology of Automatic Relevance Determination (ARD) (MacKay, 1994; Neal, 1995) is one such approach. In this thesis Laplace approximations, variational learning and Markov Chain Monte Carlo (MCMC) methods for hyperparameter determination are assessed within a Bayesian neural network. Empirical results highlight the numerical instability of the Laplace and variational methods with convergence to a local rather than global minima. Kernel methods have become a popular modelling approach (Vapnik, 1998; Smola, 1998; Williams, 1998). In this thesis the constructed kernel models are equipped with hyperparameters that allow: the ability to select important input variables, the ability to visualise the model structure and the ability to incorporate prior or expert knowledge. Ideas from the Bayesian and the signal processing communities together with the representational advantage of a sparse ANOVA decomposition have been merged. Interpretability is introduced by using two forms of regularisation: a 1-norm based structural regulariser to enforce interpretability, and a 2-norm based regulariser to control smoothness. The model structure can be visualised showing the overall effects of different inputs, their interactions, and the strength of the interactions. The performance of these interpretable learning algorithms is demonstrated on both synthetic and “real” data, notably the AMPG dataset, the Boston house price dataset, and the problem of predicting the mechanical property proof stress of a metal based on its chemical composition. Results from these different approaches are compared in terms of their interpretabilty by exploiting prior knowledge of the problem, and show the potential of interpretable data models.
University of Southampton
Kandola, J.S.
c976459a-d502-4688-b741-334c06796ca8
1 June 2001
Kandola, J.S.
c976459a-d502-4688-b741-334c06796ca8
Gunn, S.R.
306af9b3-a7fa-4381-baf9-5d6a6ec89868
Kandola, J.S.
(2001)
Interpretable modelling with sparse kernels.
University of Southampton, Electronics and Computer Science : University of Southampton, Doctoral Thesis, 177pp.
Record type:
Thesis
(Doctoral)
Abstract
A drawback of many statistical modelling techniques, commonly used in machine learning, is that the resulting model is difficult to interpret. The principal focus of this thesis is the development of advanced non-linear interpretable models. Interpretable modelling offers us a powerful tool with which to understand the structure of a model constructed from data, allowing model validation and assisting in model selection. Gibbs (1997) observes the easiest way to introduce model interpretability is to use models where the parameters and the related hyperparameters have clearly interpretable meanings. The Bayesian methodology of Automatic Relevance Determination (ARD) (MacKay, 1994; Neal, 1995) is one such approach. In this thesis Laplace approximations, variational learning and Markov Chain Monte Carlo (MCMC) methods for hyperparameter determination are assessed within a Bayesian neural network. Empirical results highlight the numerical instability of the Laplace and variational methods with convergence to a local rather than global minima. Kernel methods have become a popular modelling approach (Vapnik, 1998; Smola, 1998; Williams, 1998). In this thesis the constructed kernel models are equipped with hyperparameters that allow: the ability to select important input variables, the ability to visualise the model structure and the ability to incorporate prior or expert knowledge. Ideas from the Bayesian and the signal processing communities together with the representational advantage of a sparse ANOVA decomposition have been merged. Interpretability is introduced by using two forms of regularisation: a 1-norm based structural regulariser to enforce interpretability, and a 2-norm based regulariser to control smoothness. The model structure can be visualised showing the overall effects of different inputs, their interactions, and the strength of the interactions. The performance of these interpretable learning algorithms is demonstrated on both synthetic and “real” data, notably the AMPG dataset, the Boston house price dataset, and the problem of predicting the mechanical property proof stress of a metal based on its chemical composition. Results from these different approaches are compared in terms of their interpretabilty by exploiting prior knowledge of the problem, and show the potential of interpretable data models.
Text
Thesis.pdf
- Version of Record
More information
Published date: 1 June 2001
Organisations:
University of Southampton, Electronic & Software Systems
Identifiers
Local EPrints ID: 256087
URI: http://eprints.soton.ac.uk/id/eprint/256087
PURE UUID: b1de0571-1521-4b8e-9358-f518ec3ceaea
Catalogue record
Date deposited: 29 Nov 2003
Last modified: 14 Mar 2024 05:38
Export record
Contributors
Author:
J.S. Kandola
Thesis advisor:
S.R. Gunn
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics