Population size estimation using covariates having missing values and measurement error: estimating ethnic group sizes in New Zealand
Population size estimation using covariates having missing values and measurement error: estimating ethnic group sizes in New Zealand
We investigate the use of multiple linked lists for population size estimation and to estimate the relationships between covariates appearing on the lists. Over the lists, the covariates aim to measure the same concept. The relationships between the covariates are not fully known because of missing values on the covariates: some cases do not appear in some lists; some cases are on one or more of the lists but have missing covariate values on some of the lists; and some cases are not observed in any list. In earlier work, multiple system estimation has been combined with latent class analysis to give a consensus estimate where an underlying dichotomous categorical covariate is measured differently in different lists. This was applied to ethnicity covariates in New Zealand with two levels, Māori and non-Māori. In this paper, we apply this approach to ethnicity covariates with a larger number of categories, and find that it produces satisfactory results with four categories. We assess the purity of the latent classes using entropy and conditional probability measures. We also examine the evolution of annual estimates from multiple lists (where one list is the population census) over 2013–2020, finding that the estimated latent class proportions are very stable. We assess the impact of disclosure control measures on the outputs.
administrative data, capture–recapture, entropy, latent class multiple system estimation, purity of latent classes
423-453
Smith, Paul A.
a2548525-4f99-4baf-a4d0-2b216cce059c
van der Heijden, Peter G.M.
85157917-3b33-4683-81be-713f987fd612
Cruyff, Maarten
68bcfa19-3d85-4b0f-a6a4-6e148b265f19
Pantalone, Francesco
c1b85bef-a71c-4851-9807-7776bc0b5ded
Diener, Hannes
cf985ec7-e655-446f-a283-3fb56510e7de
Dunstan, Kim
bf46d7f9-023a-4875-86a8-f152332eb5ca
22 September 2025
Smith, Paul A.
a2548525-4f99-4baf-a4d0-2b216cce059c
van der Heijden, Peter G.M.
85157917-3b33-4683-81be-713f987fd612
Cruyff, Maarten
68bcfa19-3d85-4b0f-a6a4-6e148b265f19
Pantalone, Francesco
c1b85bef-a71c-4851-9807-7776bc0b5ded
Diener, Hannes
cf985ec7-e655-446f-a283-3fb56510e7de
Dunstan, Kim
bf46d7f9-023a-4875-86a8-f152332eb5ca
Smith, Paul A., van der Heijden, Peter G.M., Cruyff, Maarten, Pantalone, Francesco, Diener, Hannes and Dunstan, Kim
(2025)
Population size estimation using covariates having missing values and measurement error: estimating ethnic group sizes in New Zealand.
Australian & New Zealand Journal of Statistics, 67 (3), .
(doi:10.1111/anzs.70014).
Abstract
We investigate the use of multiple linked lists for population size estimation and to estimate the relationships between covariates appearing on the lists. Over the lists, the covariates aim to measure the same concept. The relationships between the covariates are not fully known because of missing values on the covariates: some cases do not appear in some lists; some cases are on one or more of the lists but have missing covariate values on some of the lists; and some cases are not observed in any list. In earlier work, multiple system estimation has been combined with latent class analysis to give a consensus estimate where an underlying dichotomous categorical covariate is measured differently in different lists. This was applied to ethnicity covariates in New Zealand with two levels, Māori and non-Māori. In this paper, we apply this approach to ethnicity covariates with a larger number of categories, and find that it produces satisfactory results with four categories. We assess the purity of the latent classes using entropy and conditional probability measures. We also examine the evolution of annual estimates from multiple lists (where one list is the population census) over 2013–2020, finding that the estimated latent class proportions are very stable. We assess the impact of disclosure control measures on the outputs.
Text
Smith et al. (2025) Maori and Pacific
- Version of Record
More information
Accepted/In Press date: 10 February 2025
e-pub ahead of print date: 19 June 2025
Published date: 22 September 2025
Keywords:
administrative data, capture–recapture, entropy, latent class multiple system estimation, purity of latent classes
Identifiers
Local EPrints ID: 502353
URI: http://eprints.soton.ac.uk/id/eprint/502353
ISSN: 1369-1473
PURE UUID: be48cbab-3de8-4386-a441-248ae3f23396
Catalogue record
Date deposited: 24 Jun 2025 16:34
Last modified: 22 Sep 2025 17:13
Export record
Altmetrics
Contributors
Author:
Maarten Cruyff
Author:
Francesco Pantalone
Author:
Hannes Diener
Author:
Kim Dunstan
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics