The University of Southampton
University of Southampton Institutional Repository

Estimation of population totals from imperfect census, survey and administrative records

Estimation of population totals from imperfect census, survey and administrative records
Estimation of population totals from imperfect census, survey and administrative records
The theoretical framework of estimating the population totals from the Census, Survey and an Administrative Records List is based on capture-recapture methodology which has traditionally been employed for the measurement of abundance of biological populations. Under this framework, in order to estimate the unknown population total, N, an initial set of individuals is captured. Further subsequent captures are taken at later periods. The possible capture histories can be represented by the cells of a 2r contingency table, where r is the number of captures. This contingency table will have one cell missing, corresponding to the population missed in all r captures. If this cell count can be estimated, adding this to the sum of the observed cells will yield the population size of interest. There are a number of models that may be specified based on the incomplete (2r-1) table of observed counts, and if a model is found that adequately fits these observed counts an estimate of the unobserved cell can be derived. The thesis will be concentrating on the log-linear model specification of capture-recapture models.

In the simplest capture-recapture model, there are two lists (for example, a Census and a Survey) leading to a 2x2 contingency table, with three observed counts and an unobserved cell count. By assuming there is independence between the Census and Survey, an estimate of the unobserved cell can be obtained. It will be shown that when there is information from individual capture in the Census, Survey and a third (the Administrative List) it is possible to account for different dependencies, specifically the association between capture in the Census and Survey. The assumption of independence which is pivotal to the case when there are only two captures can now be relaxed. However, the introduction of the Administrative List means that overenumeration cannot be assumed to be negligible.

Therefore, the proposal is to use latent class models, where the idea is that there is a latent variable with two classes - one representing the real enumerations and the other, erroneous enumerations. Under the classical parameterisation of latent class models, there is the assumption of local independence, implying that the Census, Survey and Administrative List are conditionally independent given the latent variable. Consequently, when an individual’s enumeration in the Census is associated with their enumeration in the Survey this latent model is invalidated. There are a number of locally dependent latent class models, but within a triple system scenario most encounter problems regarding model identifiability; to be precise, the model solutions are not unique. Thus the thesis investigates the use of the Expectation Maximization (EM) algorithm to fit a locally dependent (and identifiable) latent model to capture-recapture data from three systems.
University of Southampton
Baffour-Awuah, Bernard
6175dd49-23cf-4d62-85d3-ef6dfb70e907
Baffour-Awuah, Bernard
6175dd49-23cf-4d62-85d3-ef6dfb70e907
Smith, P.W.F.
961a01a3-bf4c-43ca-9599-5be4fd5d3940
Brown, J.J.
4a6c2a3c-b40c-432b-825e-a85c982a96d1

Baffour-Awuah, Bernard (2009) Estimation of population totals from imperfect census, survey and administrative records. University of Southampton, School of Social Sciences, Doctoral Thesis, 198pp.

Record type: Thesis (Doctoral)

Abstract

The theoretical framework of estimating the population totals from the Census, Survey and an Administrative Records List is based on capture-recapture methodology which has traditionally been employed for the measurement of abundance of biological populations. Under this framework, in order to estimate the unknown population total, N, an initial set of individuals is captured. Further subsequent captures are taken at later periods. The possible capture histories can be represented by the cells of a 2r contingency table, where r is the number of captures. This contingency table will have one cell missing, corresponding to the population missed in all r captures. If this cell count can be estimated, adding this to the sum of the observed cells will yield the population size of interest. There are a number of models that may be specified based on the incomplete (2r-1) table of observed counts, and if a model is found that adequately fits these observed counts an estimate of the unobserved cell can be derived. The thesis will be concentrating on the log-linear model specification of capture-recapture models.

In the simplest capture-recapture model, there are two lists (for example, a Census and a Survey) leading to a 2x2 contingency table, with three observed counts and an unobserved cell count. By assuming there is independence between the Census and Survey, an estimate of the unobserved cell can be obtained. It will be shown that when there is information from individual capture in the Census, Survey and a third (the Administrative List) it is possible to account for different dependencies, specifically the association between capture in the Census and Survey. The assumption of independence which is pivotal to the case when there are only two captures can now be relaxed. However, the introduction of the Administrative List means that overenumeration cannot be assumed to be negligible.

Therefore, the proposal is to use latent class models, where the idea is that there is a latent variable with two classes - one representing the real enumerations and the other, erroneous enumerations. Under the classical parameterisation of latent class models, there is the assumption of local independence, implying that the Census, Survey and Administrative List are conditionally independent given the latent variable. Consequently, when an individual’s enumeration in the Census is associated with their enumeration in the Survey this latent model is invalidated. There are a number of locally dependent latent class models, but within a triple system scenario most encounter problems regarding model identifiability; to be precise, the model solutions are not unique. Thus the thesis investigates the use of the Expectation Maximization (EM) algorithm to fit a locally dependent (and identifiable) latent model to capture-recapture data from three systems.

Text
B_Baffour_Awuah_Thesis.pdf - Version of Record
Available under License University of Southampton Thesis Licence.
Download (1MB)

More information

Published date: October 2009
Organisations: University of Southampton

Identifiers

Local EPrints ID: 72367
URI: http://eprints.soton.ac.uk/id/eprint/72367
PURE UUID: 66951294-63bb-44c9-8662-90486caf27c3
ORCID for P.W.F. Smith: ORCID iD orcid.org/0000-0003-4423-5410

Catalogue record

Date deposited: 10 Feb 2010
Last modified: 14 Mar 2024 02:35

Export record

Contributors

Author: Bernard Baffour-Awuah
Thesis advisor: P.W.F. Smith ORCID iD
Thesis advisor: J.J. Brown

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×