Imputation vs. Estimation of Finite Population Distributions
Imputation vs. Estimation of Finite Population Distributions
Estimates of the distribution of hourly wage rates for employees are an important output for a national statistics agency. However, many employees are not paid by the hour and so their hourly wage rate data are effectively missing in a survey that attempts to collect this information. A standard approach in this situation is to impute these missing values using derived measures of this wage rate based on salary and hours worked data also collected in the survey. This paper contrasts this imputation approach with direct estimation of the wage rate distribution using the derived wage rate variable as an auxiliary. In particular, we focus on data obtained in the 2002 UK New Earnings Survey and use simulation based on actual and derived hourly wage rate data collected in this survey to compare two imputation approaches, one based on substituting the derived wage rate values for the missing actual values, the other using nearest neighbour imputation based on the derived wage rate, with two estimation approaches that use this variable as an auxiliary. The first of these is a semi-parametric extension of the Chambers and Dunstan (1986) estimator of the finite population distribution function, the other is a calibrated spline-based estimator of this function recently suggested by Harms and Duchesne (2004). Our conclusion is that an approach based on the semi-parametric estimator is best for these data. However, confidence interval estimation remains an open problem.
Southampton Statistical Sciences Research Institute, University of Southampton
Chambers, R. L.
8a7dccad-7738-408e-a509-2de287bde907
25 January 2005
Chambers, R. L.
8a7dccad-7738-408e-a509-2de287bde907
Chambers, R. L.
(2005)
Imputation vs. Estimation of Finite Population Distributions
(S3RI Methodology Working Papers, M05/06)
Southampton, UK.
Southampton Statistical Sciences Research Institute, University of Southampton
25pp.
Record type:
Monograph
(Working Paper)
Abstract
Estimates of the distribution of hourly wage rates for employees are an important output for a national statistics agency. However, many employees are not paid by the hour and so their hourly wage rate data are effectively missing in a survey that attempts to collect this information. A standard approach in this situation is to impute these missing values using derived measures of this wage rate based on salary and hours worked data also collected in the survey. This paper contrasts this imputation approach with direct estimation of the wage rate distribution using the derived wage rate variable as an auxiliary. In particular, we focus on data obtained in the 2002 UK New Earnings Survey and use simulation based on actual and derived hourly wage rate data collected in this survey to compare two imputation approaches, one based on substituting the derived wage rate values for the missing actual values, the other using nearest neighbour imputation based on the derived wage rate, with two estimation approaches that use this variable as an auxiliary. The first of these is a semi-parametric extension of the Chambers and Dunstan (1986) estimator of the finite population distribution function, the other is a calibrated spline-based estimator of this function recently suggested by Harms and Duchesne (2004). Our conclusion is that an approach based on the semi-parametric estimator is best for these data. However, confidence interval estimation remains an open problem.
Text
14076-01.pdf
- Other
More information
Published date: 25 January 2005
Identifiers
Local EPrints ID: 14076
URI: http://eprints.soton.ac.uk/id/eprint/14076
PURE UUID: 2ec8691c-b8df-4723-8b9c-cb7ac83d7561
Catalogue record
Date deposited: 26 Jan 2005
Last modified: 15 Mar 2024 05:18
Export record
Contributors
Author:
R. L. Chambers
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics