Chambers, R. L.
Imputation vs. Estimation of Finite Population Distributions , Southampton, UK Southampton Statistical Sciences Research Institute 25pp.
(S3RI Methodology Working Papers, M05/06).
Estimates of the distribution of hourly wage rates for employees are an important output for a national statistics agency. However, many employees are not paid by the hour and so their hourly wage rate data are effectively missing in a survey that attempts to collect this information. A standard approach in this situation is to impute these missing values using derived measures of this wage rate based on salary and hours worked data also collected in the survey. This paper contrasts this imputation approach with direct estimation of the wage rate distribution using the derived wage rate variable as an auxiliary. In particular, we focus on data obtained in the 2002 UK New Earnings Survey and use simulation based on actual and derived hourly wage rate data collected in this survey to compare two imputation approaches, one based on substituting the derived wage rate values for the missing actual values, the other using nearest neighbour imputation based on the derived wage rate, with two estimation approaches that use this variable as an auxiliary. The first of these is a semi-parametric extension of the Chambers and Dunstan (1986) estimator of the finite population distribution function, the other is a calibrated spline-based estimator of this function recently suggested by Harms and Duchesne (2004). Our conclusion is that an approach based on the semi-parametric estimator is best for these data. However, confidence interval estimation remains an open problem.
Actions (login required)