The University of Southampton
University of Southampton Institutional Repository

Optimal and sequential design for bridge regression with application in organic chemistry

Optimal and sequential design for bridge regression with application in organic chemistry
Optimal and sequential design for bridge regression with application in organic chemistry
This thesis presents and applies methods for the design and analysis of experiments for a family of coefficient shrinkage methods, known collectively as bridge regression, with emphasis on the two special cases of ridge regression and the lasso. The application is the problem of understanding and predicting the melting point of small molecule organic compounds using chemical descriptors.
Experiments typically have a large number of predictors compared to the number of observations, and high correlations between pairs of predictors. In this thesis, bridge regression is used to select linear models which are then compared to models selected by more commonly used methods of variable selection, such as subset selection and stepwise selection. Models including two-way product, or interaction, terms are also considered. A general method is developed for the selection of an optimal design when accurate estimates of the model coefficients are required. The method exploits a relationship between bridge regression and Bayesian methods which is used to develop a class of D-optimal designs. A necessary approximation to the variance-covariance matrix of coefficient estimators is derived. Designs are found using algorithmic search for ridge regression and the lasso, for experiments with (a) two-level factors and (b) the motivating chemistry problem. Comparisons are made with alternative designs. A sequential design criterion is developed to enhance an existing design. The criterion selects additional design points, from a finite set of candidate points, that exhibit the highest estimated prediction variance obtained from bootstrapping. The method is applied to the Bayesian D-optimal designs and is shown to be capable of improving design performance through the addition of only a small number of runs
Carnaby, Sarah
7deb6e07-512c-4ce8-b3ba-17964fd485d5
Carnaby, Sarah
7deb6e07-512c-4ce8-b3ba-17964fd485d5
Woods, David
ae21f7e2-29d9-4f55-98a2-639c5e44c79c

Carnaby, Sarah (2011) Optimal and sequential design for bridge regression with application in organic chemistry. University of Southampton, School of Mathematics, Doctoral Thesis, 182pp.

Record type: Thesis (Doctoral)

Abstract

This thesis presents and applies methods for the design and analysis of experiments for a family of coefficient shrinkage methods, known collectively as bridge regression, with emphasis on the two special cases of ridge regression and the lasso. The application is the problem of understanding and predicting the melting point of small molecule organic compounds using chemical descriptors.
Experiments typically have a large number of predictors compared to the number of observations, and high correlations between pairs of predictors. In this thesis, bridge regression is used to select linear models which are then compared to models selected by more commonly used methods of variable selection, such as subset selection and stepwise selection. Models including two-way product, or interaction, terms are also considered. A general method is developed for the selection of an optimal design when accurate estimates of the model coefficients are required. The method exploits a relationship between bridge regression and Bayesian methods which is used to develop a class of D-optimal designs. A necessary approximation to the variance-covariance matrix of coefficient estimators is derived. Designs are found using algorithmic search for ridge regression and the lasso, for experiments with (a) two-level factors and (b) the motivating chemistry problem. Comparisons are made with alternative designs. A sequential design criterion is developed to enhance an existing design. The criterion selects additional design points, from a finite set of candidate points, that exhibit the highest estimated prediction variance obtained from bootstrapping. The method is applied to the Bayesian D-optimal designs and is shown to be capable of improving design performance through the addition of only a small number of runs

PDF
Sarah_Carnaby_PhD_Thesis.pdf - Other
Download (1MB)

More information

Published date: 10 January 2011
Organisations: University of Southampton

Identifiers

Local EPrints ID: 176445
URI: https://eprints.soton.ac.uk/id/eprint/176445
PURE UUID: cc0fdcea-8498-4032-ba06-b57df77da3cd
ORCID for David Woods: ORCID iD orcid.org/0000-0001-7648-429X

Catalogue record

Date deposited: 24 May 2011 13:26
Last modified: 19 Jun 2019 00:37

Export record

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of https://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×