The University of Southampton
University of Southampton Institutional Repository

Comparing methods of analysing datasets with small clusters-case studies using four paediatric datasets

Comparing methods of analysing datasets with small clusters-case studies using four paediatric datasets
Comparing methods of analysing datasets with small clusters-case studies using four paediatric datasets
Studies of prematurely born infants contain a relatively large percentage of multiple births, so the resulting data have a hierarchical structure with small clusters of size 1, 2 or 3. Ignoring the clustering may lead to incorrect inferences. The aim of this study was to compare statistical methods which can be used to analyse such data: generalised estimating equations, multilevel models, multiple linear regression and logistic regression. Four datasets which differed in total size and in percentage of multiple births (n = 254, multiple 18%; n = 176, multiple 9%; n = 10 098, multiple 3%; n = 1585, multiple 8%) were analysed. With the continuous outcome, two-level models produced similar results in the larger dataset, while generalised least squares multilevel modelling (ML GLS 'xtreg' in Stata) and maximum likelihood multilevel modelling (ML MLE 'xtmixed' in Stata) produced divergent estimates using the smaller dataset. For the dichotomous outcome, most methods, except generalised least squares multilevel modelling (ML GH 'xtlogit' in Stata) gave similar odds ratios and 95% confidence intervals within datasets. For the continuous outcome, our results suggest using multilevel modelling. We conclude that generalised least squares multilevel modelling (ML GLS 'xtreg' in Stata) and maximum likelihood multilevel modelling (ML MLE 'xtmixed' in Stata) should be used with caution when the dataset is small. Where the outcome is dichotomous and there is a relatively large percentage of non-independent data, it is recommended that these are accounted for in analyses using logistic regression with adjusted standard errors or multilevel modelling. If, however, the dataset has a small percentage of clusters greater than size 1 (e.g. a population dataset of children where there are few multiples) there appears to be less need to adjust for clustering.
0269-5022
380-392
Marston, Louise
258cc87f-2cf7-49de-9498-fc659a5ffde7
Peacock, Janet L.
1cb1242c-7606-4f8e-86d0-d3cd2ceff782
Yu, Kemin
64a64675-5733-4fa1-95c3-fe418f6d5cb1
Brocklehurst, Peter
f1b7dd3f-7165-4b14-a6f6-2bb62521a990
Calvert, Sandra
14a9b831-27b8-4208-9305-185493158023
Greenough, Anne
5fb7521d-ae58-4a58-9a0b-deddcf1647c2
Marlow, Neil
0c6bd3b0-464b-4f04-8dd3-72517da5cbd7
Marston, Louise
258cc87f-2cf7-49de-9498-fc659a5ffde7
Peacock, Janet L.
1cb1242c-7606-4f8e-86d0-d3cd2ceff782
Yu, Kemin
64a64675-5733-4fa1-95c3-fe418f6d5cb1
Brocklehurst, Peter
f1b7dd3f-7165-4b14-a6f6-2bb62521a990
Calvert, Sandra
14a9b831-27b8-4208-9305-185493158023
Greenough, Anne
5fb7521d-ae58-4a58-9a0b-deddcf1647c2
Marlow, Neil
0c6bd3b0-464b-4f04-8dd3-72517da5cbd7

Marston, Louise, Peacock, Janet L., Yu, Kemin, Brocklehurst, Peter, Calvert, Sandra, Greenough, Anne and Marlow, Neil (2009) Comparing methods of analysing datasets with small clusters-case studies using four paediatric datasets. Paediatric and Perinatal Epidemiology, 23 (4), 380-392. (doi:10.1111/j.1365-3016.2009.01046.x).

Record type: Article

Abstract

Studies of prematurely born infants contain a relatively large percentage of multiple births, so the resulting data have a hierarchical structure with small clusters of size 1, 2 or 3. Ignoring the clustering may lead to incorrect inferences. The aim of this study was to compare statistical methods which can be used to analyse such data: generalised estimating equations, multilevel models, multiple linear regression and logistic regression. Four datasets which differed in total size and in percentage of multiple births (n = 254, multiple 18%; n = 176, multiple 9%; n = 10 098, multiple 3%; n = 1585, multiple 8%) were analysed. With the continuous outcome, two-level models produced similar results in the larger dataset, while generalised least squares multilevel modelling (ML GLS 'xtreg' in Stata) and maximum likelihood multilevel modelling (ML MLE 'xtmixed' in Stata) produced divergent estimates using the smaller dataset. For the dichotomous outcome, most methods, except generalised least squares multilevel modelling (ML GH 'xtlogit' in Stata) gave similar odds ratios and 95% confidence intervals within datasets. For the continuous outcome, our results suggest using multilevel modelling. We conclude that generalised least squares multilevel modelling (ML GLS 'xtreg' in Stata) and maximum likelihood multilevel modelling (ML MLE 'xtmixed' in Stata) should be used with caution when the dataset is small. Where the outcome is dichotomous and there is a relatively large percentage of non-independent data, it is recommended that these are accounted for in analyses using logistic regression with adjusted standard errors or multilevel modelling. If, however, the dataset has a small percentage of clusters greater than size 1 (e.g. a population dataset of children where there are few multiples) there appears to be less need to adjust for clustering.

This record has no associated files available for download.

More information

Published date: 2009

Identifiers

Local EPrints ID: 72775
URI: http://eprints.soton.ac.uk/id/eprint/72775
ISSN: 0269-5022
PURE UUID: 6e9a25c8-e775-40a9-aeb6-066a6e84a22b

Catalogue record

Date deposited: 24 Feb 2010
Last modified: 13 Mar 2024 21:40

Export record

Altmetrics

Contributors

Author: Louise Marston
Author: Janet L. Peacock
Author: Kemin Yu
Author: Peter Brocklehurst
Author: Sandra Calvert
Author: Anne Greenough
Author: Neil Marlow

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×