Aljeddani, Sadiah
(2018)
Statistical analysis of data from experiments subject to restricted randomisation.
*University of Southampton, Doctoral Thesis*, 175pp.

## Abstract

The selection of the best subset of variables, which will have a strong effect on an outcome of interest, is fundamental when avoiding overfitting in statistical modelling. However, when there are many variables, it is computationally difficult to find this best subset. The difficulties of variable selection would be more complex when designs are with restricted randomisation. This work aims to fill the gap of variable selection and model estimation for data from experiments subject to restricted randomisation by developing new methods for variable selection and model estimation using frequentist analysis and Bayesian analysis for experiments subject to restricted randomisation. Frequentist and Bayesian analysis methods are used to carry out a comparative study with respect to their performance in variable selection and model estimation. As a representative of frequentist analysis, the Penalised Generalised Least Square (PGLS) estimator is used in which a single shrinkage parameter is applied to all regression effects. Furthermore, as two different strata in split-plot design are existed, the PGLS approach is extended to perform variable selection and model estimation simultaneously in the context of split-plot design. The Penalised Generalised Least Squares for Split-Plot Design estimator (PGLS-SPD) is utilized, in which two shrinkage parameters are applied, one for the subplot effects and the other for the whole-plot effects. As a representative of Bayesian analysis, the Stochastic Search Variable Selection (SSVS) technique is used. This performs variable selection and model estimation simultaneously where the variance of all active factors will be sampled from one posterior distribution. As two different strata in split-plot design are existed, the SSVS approach to perform Bayesian variable selection is extended for the analysis of data from restricted randomised experiments by introducing the Stochastic Search Variable Selection for Split-Plot Design (SSVS-SPD) in which the variances of the active subplot and whole-plot factors are sampled from two different posterior distributions. The usefulness of frequentist and Bayesian approaches are demonstrated using two practical examples, and their properties are studied in simulation studies. The result of the comparative study of frequentist analysis and Bayesian analysis supports the utilization of SSVS-SPD method for the statistical analysis of data from experiments subject to restricted randomisation.s computationally difficult to find this best subset. The difficulties of variable selection would be more complex when designs are with restricted randomisation. This work aims to fill the gap of variable selection and model estimation for data from experiments subject to restricted randomisation by developing new methods for variable selection and model estimation using frequentist analysis and Bayesian analysis for experiments subject to restricted randomisation. Frequentist and Bayesian analysis methods are used to carry out a comparative study with respect to their performance in variable selection and model estimation. As a representative of frequentist analysis, the Penalised Generalised Least Square (PGLS) estimator is used in which a single shrinkage parameter is applied to all regression effects. Furthermore, as two different strata in split-plot design are existed, the PGLS approach is extended to perform variable selection and model estimation simultaneously in the context of split-plot design. The Penalised Generalised Least Squares for Split-Plot Design estimator (PGLS-SPD) is utilized, in which two shrinkage parameters are applied, one for the subplot effects and the other for the whole-plot effects. As a representative of Bayesian analysis, the Stochastic Search Variable Selection (SSVS) technique is used. This performs variable selection and model estimation simultaneously where the variance of all active factors will be sampled from one posterior distribution. As two different strata in split-plot design are existed, the SSVS approach to perform Bayesian variable selection is extended for the analysis of data from restricted randomised experiments by introducing the Stochastic Search Variable Selection for Split-Plot Design (SSVS-SPD) in which the variances of the active subplot and whole-plot factors are sampled from two different posterior distributions. The usefulness of frequentist and Bayesian approaches are demonstrated using two practical examples, and their properties are studied in simulation studies. The result of the comparative study of frequentist analysis and Bayesian analysis supports the utilization of SSVS-SPD method for the statistical analysis of data from experiments subject to restricted randomisation.

