Statistical approaches to the analysis of hierarchical data using simulations and real data from a study of musculoskeletal symptoms

Clustering of observations is a common phenomenon in epidemiological research. A first objective of this thesis was to explore the situations in which failure to account for clustering in statistical analysis could lead to erroneous conclusions. Using simulated data, I showed that effects estimated from a naïve regression model that ignored clustering were on average unbiased when the outcome was continuous, but were biased towards the null when the outcome was binary. The precision of effect estimates was overestimated when the outcome was binary, and also when both the outcome and explanatory variable were continuous. However, in linear regression with a binary explanatory variable, the precision of effects was somewhat underestimated by the naïve model. The magnitude of bias, both in point estimates and their precision, increased with greater clustering of the outcome variable, and was influenced also by clustering in the explanatory variable.

A second aim was to compare analytical approaches to clustering when synthesising results from multiple studies. Using real data from a large multicentre study, I showed that odds ratios (ORs) estimated from meta-analysis of summary results from component sub-studies were generally similar to those from multi-level modelling of pooled individual data. However, the precision of point estimates from meta-analysis was lower than that from multi level analysis. Discrepancies between the two methods (including differences in ORs up to 27% and in precision up to 46%) were demonstrated when the outcome of interest was rare.

A third aim was to compare different methods for estimation of relative risks (RRs) when data are clustered. The random-intercept complementary log-log model produced estimates of effect and precision similar to those from the random-intercept log-binomial model (considered to be the best approach, but not always practical). Other models gave effect estimates close to those from the log-binomial model, but with less comparable precision. Contrary to the situation when RRs are being estimated in a set of independent (i.e. unclustered) observations, the random-intercept Poisson model with robust variance produced less precise point estimates than those from the random intercept log-binomial model. Priorities for future work include exploration of: the consequences of ignoring clustering in the presence of effect modification and when marginal methods of analysis are used; situations in which meta analytical estimates differ from those derived by pooled analysis; and specific situations in which the random-intercept Poisson model with robust variance is less likely to produce results similar to those from the random-intercept log binomial model.

University of Southampton

Ntani, Georgia

d0eda197-ad47-426f-a791-f0057e812e32

March 2017

Ntani, Georgia

d0eda197-ad47-426f-a791-f0057e812e32

Coggon, David

2b43ce0a-cc61-4d86-b15d-794208ffa5d3

Inskip, Hazel

5fb4470a-9379-49b2-a533-9da8e61058b7

Ntani, Georgia (2017) Statistical approaches to the analysis of hierarchical data using simulations and real data from a study of musculoskeletal symptoms. University of Southampton, Doctoral Thesis, 281pp.

Record type: Thesis (Doctoral)