Estimating from Cross-sectional Categorical Data Subject to Misclassification and Double Sampling: Moment-based, Maximum Likelihood and Quasi-Likelihood Approaches
Estimating from Cross-sectional Categorical Data Subject to Misclassification and Double Sampling: Moment-based, Maximum Likelihood and Quasi-Likelihood Approaches
We discuss the analysis of cross-sectional categorical data in the presence of misclassification and double sampling. In a double sampling context we assume that along with the main measurement device, which is subject to misclassification, we have a secondary measurement device, which is free of error but more expensive to apply. Due to its higher cost, the validation survey is employed only for a subset of units. Inference using double sampling is based on combining information from both measurement devices. Previously proposed parameterisations of the misclassification model that utilize either the calibration or the misclassification probabilities are reviewed. We then show that the misclassification model can be alternatively formulated as a missing data problem using the misclassification probabilities. In this context, the model parameters are estimated using maximum likelihood estimation via the EM algorithm. We suggest that the formulation of the misclassification model as a missing data problem using the misclassification probabilities, as opposed to maximum likelihood estimation using the calibration probabilities, offers a robust basis for extending the model to handle more complex situations. We further illustrate that the likelihood-based approaches offer some practical advantages over the moment-based approaches. As an alternative approach, we also present a quasi-likelihood parameterisation of the misclassification model. In this framework, an explicit definition of the likelihood function is avoided and a different way of resolving a missing data problem is provided. The quasi-likelihood method offers further practical advantages to the data analyst over the likelihood-based and the moment-based approaches. Variance estimation under the alternative parameterisations is discussed. The different methods are illustrated using two numerical examples and a Monte-Carlo simulation study.
Southampton Statistical Sciences Research Institute, University of Southampton
Tzavidis, Nikos
431ec55d-c147-466d-9c65-0f377b0c1f6a
Lin, Yan-Xia
fc5178f1-f8b5-4a89-a2b7-35d9f5d2f157
2004
Tzavidis, Nikos
431ec55d-c147-466d-9c65-0f377b0c1f6a
Lin, Yan-Xia
fc5178f1-f8b5-4a89-a2b7-35d9f5d2f157
Tzavidis, Nikos and Lin, Yan-Xia
(2004)
Estimating from Cross-sectional Categorical Data Subject to Misclassification and Double Sampling: Moment-based, Maximum Likelihood and Quasi-Likelihood Approaches
(S3RI Methodology Working Papers, M04/03)
Southampton, UK.
Southampton Statistical Sciences Research Institute, University of Southampton
34pp.
Record type:
Monograph
(Project Report)
Abstract
We discuss the analysis of cross-sectional categorical data in the presence of misclassification and double sampling. In a double sampling context we assume that along with the main measurement device, which is subject to misclassification, we have a secondary measurement device, which is free of error but more expensive to apply. Due to its higher cost, the validation survey is employed only for a subset of units. Inference using double sampling is based on combining information from both measurement devices. Previously proposed parameterisations of the misclassification model that utilize either the calibration or the misclassification probabilities are reviewed. We then show that the misclassification model can be alternatively formulated as a missing data problem using the misclassification probabilities. In this context, the model parameters are estimated using maximum likelihood estimation via the EM algorithm. We suggest that the formulation of the misclassification model as a missing data problem using the misclassification probabilities, as opposed to maximum likelihood estimation using the calibration probabilities, offers a robust basis for extending the model to handle more complex situations. We further illustrate that the likelihood-based approaches offer some practical advantages over the moment-based approaches. As an alternative approach, we also present a quasi-likelihood parameterisation of the misclassification model. In this framework, an explicit definition of the likelihood function is avoided and a different way of resolving a missing data problem is provided. The quasi-likelihood method offers further practical advantages to the data analyst over the likelihood-based and the moment-based approaches. Variance estimation under the alternative parameterisations is discussed. The different methods are illustrated using two numerical examples and a Monte-Carlo simulation study.
More information
Published date: 2004
Identifiers
Local EPrints ID: 8176
URI: http://eprints.soton.ac.uk/id/eprint/8176
PURE UUID: 02954874-e98d-4f50-ad6f-4c17f4a2c6b6
Catalogue record
Date deposited: 11 Jul 2004
Last modified: 16 Mar 2024 03:23
Export record
Contributors
Author:
Yan-Xia Lin
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics