Estimation of population totals from imperfect census, survey and administrative records
Estimation of population totals from imperfect census, survey and administrative records
The theoretical framework of estimating the population totals from the Census, Survey and an Administrative Records List is based on capture-recapture methodology which has traditionally been employed for the measurement of abundance of biological populations. Under this framework, in order to estimate the unknown population total, N, an initial set of individuals is captured. Further subsequent captures are taken at later periods. The possible capture histories can be represented by the cells of a 2r contingency table, where r is the number of captures. This contingency table will have one cell missing, corresponding to the population missed in all r captures. If this cell count can be estimated, adding this to the sum of the observed cells will yield the population size of interest. There are a number of models that may be specified based on the incomplete (2r-1) table of observed counts, and if a model is found that adequately fits these observed counts an estimate of the unobserved cell can be derived. The thesis will be concentrating on the log-linear model specification of capture-recapture models.
In the simplest capture-recapture model, there are two lists (for example, a Census and a Survey) leading to a 2x2 contingency table, with three observed counts and an unobserved cell count. By assuming there is independence between the Census and Survey, an estimate of the unobserved cell can be obtained. It will be shown that when there is information from individual capture in the Census, Survey and a third (the Administrative List) it is possible to account for different dependencies, specifically the association between capture in the Census and Survey. The assumption of independence which is pivotal to the case when there are only two captures can now be relaxed. However, the introduction of the Administrative List means that overenumeration cannot be assumed to be negligible.
Therefore, the proposal is to use latent class models, where the idea is that there is a latent variable with two classes - one representing the real enumerations and the other, erroneous enumerations. Under the classical parameterisation of latent class models, there is the assumption of local independence, implying that the Census, Survey and Administrative List are conditionally independent given the latent variable. Consequently, when an individual’s enumeration in the Census is associated with their enumeration in the Survey this latent model is invalidated. There are a number of locally dependent latent class models, but within a triple system scenario most encounter problems regarding model identifiability; to be precise, the model solutions are not unique. Thus the thesis investigates the use of the Expectation Maximization (EM) algorithm to fit a locally dependent (and identifiable) latent model to capture-recapture data from three systems.
University of Southampton
Baffour-Awuah, Bernard
6175dd49-23cf-4d62-85d3-ef6dfb70e907
October 2009
Baffour-Awuah, Bernard
6175dd49-23cf-4d62-85d3-ef6dfb70e907
Smith, P.W.F.
961a01a3-bf4c-43ca-9599-5be4fd5d3940
Brown, J.J.
4a6c2a3c-b40c-432b-825e-a85c982a96d1
Baffour-Awuah, Bernard
(2009)
Estimation of population totals from imperfect census, survey and administrative records.
University of Southampton, School of Social Sciences, Doctoral Thesis, 198pp.
Record type:
Thesis
(Doctoral)
Abstract
The theoretical framework of estimating the population totals from the Census, Survey and an Administrative Records List is based on capture-recapture methodology which has traditionally been employed for the measurement of abundance of biological populations. Under this framework, in order to estimate the unknown population total, N, an initial set of individuals is captured. Further subsequent captures are taken at later periods. The possible capture histories can be represented by the cells of a 2r contingency table, where r is the number of captures. This contingency table will have one cell missing, corresponding to the population missed in all r captures. If this cell count can be estimated, adding this to the sum of the observed cells will yield the population size of interest. There are a number of models that may be specified based on the incomplete (2r-1) table of observed counts, and if a model is found that adequately fits these observed counts an estimate of the unobserved cell can be derived. The thesis will be concentrating on the log-linear model specification of capture-recapture models.
In the simplest capture-recapture model, there are two lists (for example, a Census and a Survey) leading to a 2x2 contingency table, with three observed counts and an unobserved cell count. By assuming there is independence between the Census and Survey, an estimate of the unobserved cell can be obtained. It will be shown that when there is information from individual capture in the Census, Survey and a third (the Administrative List) it is possible to account for different dependencies, specifically the association between capture in the Census and Survey. The assumption of independence which is pivotal to the case when there are only two captures can now be relaxed. However, the introduction of the Administrative List means that overenumeration cannot be assumed to be negligible.
Therefore, the proposal is to use latent class models, where the idea is that there is a latent variable with two classes - one representing the real enumerations and the other, erroneous enumerations. Under the classical parameterisation of latent class models, there is the assumption of local independence, implying that the Census, Survey and Administrative List are conditionally independent given the latent variable. Consequently, when an individual’s enumeration in the Census is associated with their enumeration in the Survey this latent model is invalidated. There are a number of locally dependent latent class models, but within a triple system scenario most encounter problems regarding model identifiability; to be precise, the model solutions are not unique. Thus the thesis investigates the use of the Expectation Maximization (EM) algorithm to fit a locally dependent (and identifiable) latent model to capture-recapture data from three systems.
Text
B_Baffour_Awuah_Thesis.pdf
- Version of Record
More information
Published date: October 2009
Organisations:
University of Southampton
Identifiers
Local EPrints ID: 72367
URI: http://eprints.soton.ac.uk/id/eprint/72367
PURE UUID: 66951294-63bb-44c9-8662-90486caf27c3
Catalogue record
Date deposited: 10 Feb 2010
Last modified: 11 Dec 2021 02:53
Export record
Contributors
Author:
Bernard Baffour-Awuah
Thesis advisor:
J.J. Brown
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics