The University of Southampton
University of Southampton Institutional Repository

Multiply imputing missing values in data sets with mixed measurement scales using a sequence of generalised linear models

Multiply imputing missing values in data sets with mixed measurement scales using a sequence of generalised linear models
Multiply imputing missing values in data sets with mixed measurement scales using a sequence of generalised linear models
Multiple imputation is a commonly used approach to deal with missing values. In this approach, an imputer repeatedly imputes the missing values by taking draws from the posterior predictive distribution for the missing values conditional on the observed values, and releases these completed data sets to analysts. With each completed data set the analyst performs the analysis of interest, treating the data as if it were fully observed. These analyses are then combined with standard combining rules, allowing the analyst to make appropriate inferences which take into account the uncertainty present due to the missing data. In order to preserve the statistical properties present in the data, the imputer must use a plausible distribution to generate the imputed values. In data sets containing variables with different measurement scales, e.g. some categorical and some continuous variables, this is a challenging problem. A method is proposed to multiply impute missing values in such data sets by modelling the joint distribution of the variables in the data through a sequence of generalised linear models, and data augmentation methods are used to draw imputations from a proper posterior distribution using Markov Chain Monte Carlo (MCMC). The performance of the proposed method is illustrated using simulation studies and on a data set taken from a breast feeding study.
data augmentation, latent variable, missing data, multiple imputation
0167-9473
24-38
Lee, Min
17b41777-7dc8-498c-a052-0ed726d2968a
Mitra, Robin
2b944cd7-5be8-4dd1-ab44-f8ada9a33405
Lee, Min
17b41777-7dc8-498c-a052-0ed726d2968a
Mitra, Robin
2b944cd7-5be8-4dd1-ab44-f8ada9a33405

Lee, Min and Mitra, Robin (2016) Multiply imputing missing values in data sets with mixed measurement scales using a sequence of generalised linear models. Computational Statistics & Data Analysis, 95, 24-38. (doi:10.1016/j.csda.2015.08.004).

Record type: Article

Abstract

Multiple imputation is a commonly used approach to deal with missing values. In this approach, an imputer repeatedly imputes the missing values by taking draws from the posterior predictive distribution for the missing values conditional on the observed values, and releases these completed data sets to analysts. With each completed data set the analyst performs the analysis of interest, treating the data as if it were fully observed. These analyses are then combined with standard combining rules, allowing the analyst to make appropriate inferences which take into account the uncertainty present due to the missing data. In order to preserve the statistical properties present in the data, the imputer must use a plausible distribution to generate the imputed values. In data sets containing variables with different measurement scales, e.g. some categorical and some continuous variables, this is a challenging problem. A method is proposed to multiply impute missing values in such data sets by modelling the joint distribution of the variables in the data through a sequence of generalised linear models, and data augmentation methods are used to draw imputations from a proper posterior distribution using Markov Chain Monte Carlo (MCMC). The performance of the proposed method is illustrated using simulation studies and on a data set taken from a breast feeding study.

Text
leemitras3riworkingpaper.pdf - Accepted Manuscript
Download (1MB)

More information

Accepted/In Press date: 6 August 2015
e-pub ahead of print date: 9 September 2015
Published date: March 2016
Keywords: data augmentation, latent variable, missing data, multiple imputation
Organisations: Statistical Sciences Research Institute

Identifiers

Local EPrints ID: 361994
URI: http://eprints.soton.ac.uk/id/eprint/361994
ISSN: 0167-9473
PURE UUID: 675ce750-d8ef-4013-92ca-814ef47fa659

Catalogue record

Date deposited: 11 Feb 2014 15:23
Last modified: 14 Mar 2024 15:59

Export record

Altmetrics

Contributors

Author: Min Lee
Author: Robin Mitra

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×