Estimation of population totals from imperfect census, survey and administrative records

The theoretical framework of estimating the population totals from the Census, Survey and an Administrative Records List is based on capture-recapture methodology which has traditionally been employed for the measurement of abundance of biological populations. Under this framework, in order to estimate the unknown population total, N, an initial set of individuals is captured. Further subsequent captures are taken at later periods. The possible capture histories can be represented by the cells of a 2^r contingency table, where r is the number of captures. This contingency table will have one cell missing, corresponding to the population missed in all r captures. If this cell count can be estimated, adding this to the sum of the observed cells will yield the population size of interest. There are a number of models that may be specified based on the incomplete (2^r-1) table of observed counts, and if a model is found that adequately fits these observed counts an estimate of the unobserved cell can be derived. The thesis will be concentrating on the log-linear model specification of capture-recapture models.

In the simplest capture-recapture model, there are two lists (for example, a Census and a Survey) leading to a 2x2 contingency table, with three observed counts and an unobserved cell count. By assuming there is independence between the Census and Survey, an estimate of the unobserved cell can be obtained. It will be shown that when there is information from individual capture in the Census, Survey and a third (the Administrative List) it is possible to account for different dependencies, specifically the association between capture in the Census and Survey. The assumption of independence which is pivotal to the case when there are only two captures can now be relaxed. However, the introduction of the Administrative List means that overenumeration cannot be assumed to be negligible.

Therefore, the proposal is to use latent class models, where the idea is that there is a latent variable with two classes - one representing the real enumerations and the other, erroneous enumerations. Under the classical parameterisation of latent class models, there is the assumption of local independence, implying that the Census, Survey and Administrative List are conditionally independent given the latent variable. Consequently, when an individual’s enumeration in the Census is associated with their enumeration in the Survey this latent model is invalidated. There are a number of locally dependent latent class models, but within a triple system scenario most encounter problems regarding model identifiability; to be precise, the model solutions are not unique. Thus the thesis investigates the use of the Expectation Maximization (EM) algorithm to fit a locally dependent (and identifiable) latent model to capture-recapture data from three systems.

University of Southampton

Baffour-Awuah, Bernard

6175dd49-23cf-4d62-85d3-ef6dfb70e907

October 2009

Baffour-Awuah, Bernard

6175dd49-23cf-4d62-85d3-ef6dfb70e907

Smith, P.W.F.

961a01a3-bf4c-43ca-9599-5be4fd5d3940

Brown, J.J.

4a6c2a3c-b40c-432b-825e-a85c982a96d1

Baffour-Awuah, Bernard (2009) Estimation of population totals from imperfect census, survey and administrative records. University of Southampton, School of Social Sciences, Doctoral Thesis, 198pp.

Record type: Thesis (Doctoral)