Multiply imputing missing values in data sets with mixed measurement scales using a sequence of generalised linear models
Multiply imputing missing values in data sets with mixed measurement scales using a sequence of generalised linear models
Multiple imputation is a commonly used approach to deal with missing values. In this approach, an imputer repeatedly imputes the missing values by taking draws from the posterior predictive distribution for the missing values conditional on the observed values, and releases these completed data sets to analysts. With each completed data set the analyst performs the analysis of interest, treating the data as if it were fully observed. These analyses are then combined with standard combining rules, allowing the analyst to make appropriate inferences which take into account the uncertainty present due to the missing data. In order to preserve the statistical properties present in the data, the imputer must use a plausible distribution to generate the imputed values. In data sets containing variables with different measurement scales, e.g. some categorical and some continuous variables, this is a challenging problem. A method is proposed to multiply impute missing values in such data sets by modelling the joint distribution of the variables in the data through a sequence of generalised linear models, and data augmentation methods are used to draw imputations from a proper posterior distribution using Markov Chain Monte Carlo (MCMC). The performance of the proposed method is illustrated using simulation studies and on a data set taken from a breast feeding study.
data augmentation, latent variable, missing data, multiple imputation
24-38
Lee, Min
17b41777-7dc8-498c-a052-0ed726d2968a
Mitra, Robin
2b944cd7-5be8-4dd1-ab44-f8ada9a33405
March 2016
Lee, Min
17b41777-7dc8-498c-a052-0ed726d2968a
Mitra, Robin
2b944cd7-5be8-4dd1-ab44-f8ada9a33405
Lee, Min and Mitra, Robin
(2016)
Multiply imputing missing values in data sets with mixed measurement scales using a sequence of generalised linear models.
Computational Statistics & Data Analysis, 95, .
(doi:10.1016/j.csda.2015.08.004).
Abstract
Multiple imputation is a commonly used approach to deal with missing values. In this approach, an imputer repeatedly imputes the missing values by taking draws from the posterior predictive distribution for the missing values conditional on the observed values, and releases these completed data sets to analysts. With each completed data set the analyst performs the analysis of interest, treating the data as if it were fully observed. These analyses are then combined with standard combining rules, allowing the analyst to make appropriate inferences which take into account the uncertainty present due to the missing data. In order to preserve the statistical properties present in the data, the imputer must use a plausible distribution to generate the imputed values. In data sets containing variables with different measurement scales, e.g. some categorical and some continuous variables, this is a challenging problem. A method is proposed to multiply impute missing values in such data sets by modelling the joint distribution of the variables in the data through a sequence of generalised linear models, and data augmentation methods are used to draw imputations from a proper posterior distribution using Markov Chain Monte Carlo (MCMC). The performance of the proposed method is illustrated using simulation studies and on a data set taken from a breast feeding study.
Text
leemitras3riworkingpaper.pdf
- Accepted Manuscript
More information
Accepted/In Press date: 6 August 2015
e-pub ahead of print date: 9 September 2015
Published date: March 2016
Keywords:
data augmentation, latent variable, missing data, multiple imputation
Organisations:
Statistical Sciences Research Institute
Identifiers
Local EPrints ID: 361994
URI: http://eprints.soton.ac.uk/id/eprint/361994
ISSN: 0167-9473
PURE UUID: 675ce750-d8ef-4013-92ca-814ef47fa659
Catalogue record
Date deposited: 11 Feb 2014 15:23
Last modified: 14 Mar 2024 15:59
Export record
Altmetrics
Contributors
Author:
Min Lee
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics