The University of Southampton
University of Southampton Institutional Repository

Population size estimation using covariates having missing values and measurement error: estimating ethnic group sizes in New Zealand

Population size estimation using covariates having missing values and measurement error: estimating ethnic group sizes in New Zealand
Population size estimation using covariates having missing values and measurement error: estimating ethnic group sizes in New Zealand
We investigate the use of multiple linked lists for population size estimation and to estimate the relationships between covariates appearing on the lists. Over the lists, the covariates aim to measure the same concept. The relationships between the covariates are not fully known because of missing values on the covariates: some cases do not appear in some lists; some cases are on one or more of the lists but have missing covariate values on some of the lists; and some cases are not observed in any list. In earlier work, multiple system estimation has been combined with latent class analysis to give a consensus estimate where an underlying dichotomous categorical covariate is measured differently in different lists. This was applied to ethnicity covariates in New Zealand with two levels, Māori and non-Māori. In this paper, we apply this approach to ethnicity covariates with a larger number of categories, and find that it produces satisfactory results with four categories. We assess the purity of the latent classes using entropy and conditional probability measures. We also examine the evolution of annual estimates from multiple lists (where one list is the population census) over 2013–2020, finding that the estimated latent class proportions are very stable. We assess the impact of disclosure control measures on the outputs.
administrative data, capture–recapture, entropy, latent class multiple system estimation, purity of latent classes
1369-1473
423-453
Smith, Paul A.
a2548525-4f99-4baf-a4d0-2b216cce059c
van der Heijden, Peter G.M.
85157917-3b33-4683-81be-713f987fd612
Cruyff, Maarten
68bcfa19-3d85-4b0f-a6a4-6e148b265f19
Pantalone, Francesco
c1b85bef-a71c-4851-9807-7776bc0b5ded
Diener, Hannes
cf985ec7-e655-446f-a283-3fb56510e7de
Dunstan, Kim
bf46d7f9-023a-4875-86a8-f152332eb5ca
Smith, Paul A.
a2548525-4f99-4baf-a4d0-2b216cce059c
van der Heijden, Peter G.M.
85157917-3b33-4683-81be-713f987fd612
Cruyff, Maarten
68bcfa19-3d85-4b0f-a6a4-6e148b265f19
Pantalone, Francesco
c1b85bef-a71c-4851-9807-7776bc0b5ded
Diener, Hannes
cf985ec7-e655-446f-a283-3fb56510e7de
Dunstan, Kim
bf46d7f9-023a-4875-86a8-f152332eb5ca

Smith, Paul A., van der Heijden, Peter G.M., Cruyff, Maarten, Pantalone, Francesco, Diener, Hannes and Dunstan, Kim (2025) Population size estimation using covariates having missing values and measurement error: estimating ethnic group sizes in New Zealand. Australian & New Zealand Journal of Statistics, 67 (3), 423-453. (doi:10.1111/anzs.70014).

Record type: Article

Abstract

We investigate the use of multiple linked lists for population size estimation and to estimate the relationships between covariates appearing on the lists. Over the lists, the covariates aim to measure the same concept. The relationships between the covariates are not fully known because of missing values on the covariates: some cases do not appear in some lists; some cases are on one or more of the lists but have missing covariate values on some of the lists; and some cases are not observed in any list. In earlier work, multiple system estimation has been combined with latent class analysis to give a consensus estimate where an underlying dichotomous categorical covariate is measured differently in different lists. This was applied to ethnicity covariates in New Zealand with two levels, Māori and non-Māori. In this paper, we apply this approach to ethnicity covariates with a larger number of categories, and find that it produces satisfactory results with four categories. We assess the purity of the latent classes using entropy and conditional probability measures. We also examine the evolution of annual estimates from multiple lists (where one list is the population census) over 2013–2020, finding that the estimated latent class proportions are very stable. We assess the impact of disclosure control measures on the outputs.

Text
Smith et al. (2025) Maori and Pacific - Version of Record
Available under License Creative Commons Attribution.
Download (1MB)

More information

Accepted/In Press date: 10 February 2025
e-pub ahead of print date: 19 June 2025
Published date: 22 September 2025
Keywords: administrative data, capture–recapture, entropy, latent class multiple system estimation, purity of latent classes

Identifiers

Local EPrints ID: 502353
URI: http://eprints.soton.ac.uk/id/eprint/502353
ISSN: 1369-1473
PURE UUID: be48cbab-3de8-4386-a441-248ae3f23396
ORCID for Paul A. Smith: ORCID iD orcid.org/0000-0001-5337-2746
ORCID for Peter G.M. van der Heijden: ORCID iD orcid.org/0000-0002-3345-096X
ORCID for Francesco Pantalone: ORCID iD orcid.org/0000-0002-7943-7007

Catalogue record

Date deposited: 24 Jun 2025 16:34
Last modified: 22 Sep 2025 17:13

Export record

Altmetrics

Contributors

Author: Paul A. Smith ORCID iD
Author: Maarten Cruyff
Author: Francesco Pantalone ORCID iD
Author: Hannes Diener
Author: Kim Dunstan

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×