The University of Southampton
University of Southampton Institutional Repository

On the use of hierarchical models for multiple imputation and synthetic data generation

On the use of hierarchical models for multiple imputation and synthetic data generation
On the use of hierarchical models for multiple imputation and synthetic data generation
Missing data are often imputed with plausible values when various analyses are performed. One popular approach employed to impute data is multiple imputation, which requires specification of a suitable imputation model. This thesis investigates the impact on multiply imputed hierarchical datasets when the imputation model is misspecified. The first issue studied is the presence of omitted variable bias. The same issue is then studied with a focus on the use of multiple imputation for creating synthetic data to protect data confidentiality. Here, the quality of multiply imputed datasets is studied not only through performance of various analysis models, but also, risks of disclosure for sensitive data. With the help of simulation studies and a longitudinal dataset from establishments in Germany, the detrimental effect of such model misspecification is evaluated, and recommendations are made for users of multiple imputation for both missing and synthetic data. The second issue investigated is model misspecification due to incorrect modelling of the shape of the error term. Existing methods for robust regression and alternatives to the normal distribution are compared within the synthetic data context only. Results from simulation studies and data on household wealth in the UK are used to identify appropriate methods for multiple imputation in such a scenario.
University of Southampton
Rashid, Sana
ced9fe5b-c8c0-49e1-a946-e411535c011b
Rashid, Sana
ced9fe5b-c8c0-49e1-a946-e411535c011b
Mitra, Robin
2b944cd7-5be8-4dd1-ab44-f8ada9a33405
Kouris, Nikos
1c40edd8-1fc9-4cf2-a576-53cc30861414

Rashid, Sana (2017) On the use of hierarchical models for multiple imputation and synthetic data generation. University of Southampton, Doctoral Thesis, 211pp.

Record type: Thesis (Doctoral)

Abstract

Missing data are often imputed with plausible values when various analyses are performed. One popular approach employed to impute data is multiple imputation, which requires specification of a suitable imputation model. This thesis investigates the impact on multiply imputed hierarchical datasets when the imputation model is misspecified. The first issue studied is the presence of omitted variable bias. The same issue is then studied with a focus on the use of multiple imputation for creating synthetic data to protect data confidentiality. Here, the quality of multiply imputed datasets is studied not only through performance of various analysis models, but also, risks of disclosure for sensitive data. With the help of simulation studies and a longitudinal dataset from establishments in Germany, the detrimental effect of such model misspecification is evaluated, and recommendations are made for users of multiple imputation for both missing and synthetic data. The second issue investigated is model misspecification due to incorrect modelling of the shape of the error term. Existing methods for robust regression and alternatives to the normal distribution are compared within the synthetic data context only. Results from simulation studies and data on household wealth in the UK are used to identify appropriate methods for multiple imputation in such a scenario.

Text
Sana_Rashid - Version of Record
Available under License University of Southampton Thesis Licence.
Download (27MB)

More information

Published date: 4 April 2017

Identifiers

Local EPrints ID: 412632
URI: http://eprints.soton.ac.uk/id/eprint/412632
PURE UUID: b6f041a4-4a4d-4ac9-aef4-97d5ce78a34d

Catalogue record

Date deposited: 24 Jul 2017 16:32
Last modified: 30 Apr 2020 04:01

Export record

Contributors

Author: Sana Rashid
Thesis advisor: Robin Mitra
Thesis advisor: Nikos Kouris

University divisions

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×