Optimal and sequential design for bridge regression with application in organic chemistry
Optimal and sequential design for bridge regression with application in organic chemistry
This thesis presents and applies methods for the design and analysis of experiments for a family of coefficient shrinkage methods, known collectively as bridge regression, with emphasis on the two special cases of ridge regression and the lasso. The application is the problem of understanding and predicting the melting point of small molecule organic compounds using chemical descriptors.
Experiments typically have a large number of predictors compared to the number of observations, and high correlations between pairs of predictors. In this thesis, bridge regression is used to select linear models which are then compared to models selected by more commonly used methods of variable selection, such as subset selection and stepwise selection. Models including two-way product, or interaction, terms are also considered. A general method is developed for the selection of an optimal design when accurate estimates of the model coefficients are required. The method exploits a relationship between bridge regression and Bayesian methods which is used to develop a class of D-optimal designs. A necessary approximation to the variance-covariance matrix of coefficient estimators is derived. Designs are found using algorithmic search for ridge regression and the lasso, for experiments with (a) two-level factors and (b) the motivating chemistry problem. Comparisons are made with alternative designs. A sequential design criterion is developed to enhance an existing design. The criterion selects additional design points, from a finite set of candidate points, that exhibit the highest estimated prediction variance obtained from bootstrapping. The method is applied to the Bayesian D-optimal designs and is shown to be capable of improving design performance through the addition of only a small number of runs
Carnaby, Sarah
7deb6e07-512c-4ce8-b3ba-17964fd485d5
10 January 2011
Carnaby, Sarah
7deb6e07-512c-4ce8-b3ba-17964fd485d5
Woods, D.C.
ae21f7e2-29d9-4f55-98a2-639c5e44c79c
Carnaby, Sarah
(2011)
Optimal and sequential design for bridge regression with application in organic chemistry.
University of Southampton, School of Mathematics, Doctoral Thesis, 182pp.
Record type:
Thesis
(Doctoral)
Abstract
This thesis presents and applies methods for the design and analysis of experiments for a family of coefficient shrinkage methods, known collectively as bridge regression, with emphasis on the two special cases of ridge regression and the lasso. The application is the problem of understanding and predicting the melting point of small molecule organic compounds using chemical descriptors.
Experiments typically have a large number of predictors compared to the number of observations, and high correlations between pairs of predictors. In this thesis, bridge regression is used to select linear models which are then compared to models selected by more commonly used methods of variable selection, such as subset selection and stepwise selection. Models including two-way product, or interaction, terms are also considered. A general method is developed for the selection of an optimal design when accurate estimates of the model coefficients are required. The method exploits a relationship between bridge regression and Bayesian methods which is used to develop a class of D-optimal designs. A necessary approximation to the variance-covariance matrix of coefficient estimators is derived. Designs are found using algorithmic search for ridge regression and the lasso, for experiments with (a) two-level factors and (b) the motivating chemistry problem. Comparisons are made with alternative designs. A sequential design criterion is developed to enhance an existing design. The criterion selects additional design points, from a finite set of candidate points, that exhibit the highest estimated prediction variance obtained from bootstrapping. The method is applied to the Bayesian D-optimal designs and is shown to be capable of improving design performance through the addition of only a small number of runs
Text
Sarah_Carnaby_PhD_Thesis.pdf
- Other
More information
Published date: 10 January 2011
Organisations:
University of Southampton
Identifiers
Local EPrints ID: 176445
URI: http://eprints.soton.ac.uk/id/eprint/176445
PURE UUID: cc0fdcea-8498-4032-ba06-b57df77da3cd
Catalogue record
Date deposited: 24 May 2011 13:26
Last modified: 14 Mar 2024 02:44
Export record
Contributors
Author:
Sarah Carnaby
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics