The University of Southampton
University of Southampton Institutional Repository

Robust quasi-randomization-based estimation with ensemble learning for missing data

Robust quasi-randomization-based estimation with ensemble learning for missing data
Robust quasi-randomization-based estimation with ensemble learning for missing data

Missing data analysis requires assumptions about an outcome model or a response probability model to adjust for potential bias due to nonresponse. Doubly robust (DR) estimators are consistent if at least one of the models is correctly specified. Multiply robust (MR) estimators extend DR estimators by allowing for multiple models for both the outcome and/or response probability models and are consistent if at least one of the multiple models is correctly specified. We propose a robust quasi-randomization-based model approach to bring more protection against model misspecification than the existing DR and MR estimators, where any multiple semiparametric, nonparametric or machine learning models can be used for the outcome variable. The proposed estimator achieves unbiasedness by using a subsampling Rao–Blackwell method, given cell-homogenous response, regardless of any working models for the outcome. An unbiased variance estimation formula is proposed, which does not use any replicate jackknife or bootstrap methods. A simulation study shows that our proposed method outperforms the existing multiply robust estimators.

Rao–Blackwell method, cell mean model, item nonresponse, missing at random, variance estimation
0303-6898
Lee, Danhyang
ef6212e1-153d-4ef3-8a36-a11306dc3e92
Zhang, Li-Chun
a5d48518-7f71-4ed9-bdcb-6585c2da3649
Chen, Sixia
013178d7-065a-414d-a7a9-6888296e76f9
Lee, Danhyang
ef6212e1-153d-4ef3-8a36-a11306dc3e92
Zhang, Li-Chun
a5d48518-7f71-4ed9-bdcb-6585c2da3649
Chen, Sixia
013178d7-065a-414d-a7a9-6888296e76f9

Lee, Danhyang, Zhang, Li-Chun and Chen, Sixia (2022) Robust quasi-randomization-based estimation with ensemble learning for missing data. Scandinavian Journal of Statistics. (doi:10.1111/sjos.12626).

Record type: Article

Abstract

Missing data analysis requires assumptions about an outcome model or a response probability model to adjust for potential bias due to nonresponse. Doubly robust (DR) estimators are consistent if at least one of the models is correctly specified. Multiply robust (MR) estimators extend DR estimators by allowing for multiple models for both the outcome and/or response probability models and are consistent if at least one of the multiple models is correctly specified. We propose a robust quasi-randomization-based model approach to bring more protection against model misspecification than the existing DR and MR estimators, where any multiple semiparametric, nonparametric or machine learning models can be used for the outcome variable. The proposed estimator achieves unbiasedness by using a subsampling Rao–Blackwell method, given cell-homogenous response, regardless of any working models for the outcome. An unbiased variance estimation formula is proposed, which does not use any replicate jackknife or bootstrap methods. A simulation study shows that our proposed method outperforms the existing multiply robust estimators.

Text
main_revised_final_LZC - Accepted Manuscript
Download (204kB)

More information

Accepted/In Press date: 21 November 2022
e-pub ahead of print date: 11 December 2022
Published date: 20 December 2022
Additional Information: Funding Information: Dr. Sixia Chen was partially supported by the National Institute on Minority Health and Health Disparities at National Institutes of Health (1R21MD014658‐01A1) and the Oklahoma Shared Clinical and Translational Resources (U54GM104938) with an Institutional Development Award (IDeA) from National Institute of General Medical Sciences. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. Publisher Copyright: © 2022 Board of the Foundation of the Scandinavian Journal of Statistics.
Keywords: Rao–Blackwell method, cell mean model, item nonresponse, missing at random, variance estimation

Identifiers

Local EPrints ID: 473264
URI: http://eprints.soton.ac.uk/id/eprint/473264
ISSN: 0303-6898
PURE UUID: 8597ec52-5138-49f1-9e2a-296f861d2322
ORCID for Li-Chun Zhang: ORCID iD orcid.org/0000-0002-3944-9484

Catalogue record

Date deposited: 12 Jan 2023 18:33
Last modified: 17 Mar 2024 07:36

Export record

Altmetrics

Contributors

Author: Danhyang Lee
Author: Li-Chun Zhang ORCID iD
Author: Sixia Chen

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×