The University of Southampton
University of Southampton Institutional Repository

Automated probabilistic record linkage without classification for dual system population size estimation

Automated probabilistic record linkage without classification for dual system population size estimation
Automated probabilistic record linkage without classification for dual system population size estimation
Population size estimation from two incomplete surveys, known as dual system estimation, requires to know which of the population elements are simultaneously captured in both of the surveys, the task imperfectly accomplished by the means of record linkage. In this thesis we explore the conceptual closeness of the fields of probabilistic record linkage and dual system estimation, and develop methods for the population size estimation, called linkage free dual system estimation, that seamlessly integrate probabilistic record linkage and dual system estimation. Unlike many existing record linkage approaches, the one developed in this thesis is purely estimation-based and does not classify records into links and non-links. It also does not require clerical resolution of possible links. In order to theoretically justify the linkage free dual system estimation method, we revisited certain problematic aspects of probabilistic record linkage and proposed a different approach conceptualizing record linkage models. This conceptualization takes into account a very specific sampling mechanism behind record linkage tasks. It also allows analysis of certain properties and limitations of parameter estimation in linkage models. We also introduce a special case of data blocking that bridges the gap between record linkage data and estimation with these data. Special attention is paid to between-variables associations in the outcomes obtained by comparing the values of linkage variables. We also assess linkage models for identifiability using a variety of methods from the field of algebraic statistics. We demonstrate that in situations where the data in both surveys are collected for the same geographical clusters, the linkage free dual system estimation is feasible and can yield outputs of similar quality to the regular classification approaches that involve clerical interventions. We also develop accompanying variance estimation methods, and these methods rely on less restrictive assumptions than existing methods. All developments are undertaken within the frequentist paradigm.
dual system estimation, probabilistic record linkage, justification of probabilistic record linkage, identifiability, simulated annealing, Taylor series approximation, within and between linkage variables associations, variance estimation, census and census coverage survey, simulations, no-classification record linkage, linkage free dual system estimation
University of Southampton
Racinskij, Viktor
5d33ad13-d211-43ac-8235-4eb17070fc85
Racinskij, Viktor
5d33ad13-d211-43ac-8235-4eb17070fc85
Smith, Paul
a2548525-4f99-4baf-a4d0-2b216cce059c
Van Der Heijden, Peter
85157917-3b33-4683-81be-713f987fd612

Racinskij, Viktor (2024) Automated probabilistic record linkage without classification for dual system population size estimation. University of Southampton, Doctoral Thesis, 190pp.

Record type: Thesis (Doctoral)

Abstract

Population size estimation from two incomplete surveys, known as dual system estimation, requires to know which of the population elements are simultaneously captured in both of the surveys, the task imperfectly accomplished by the means of record linkage. In this thesis we explore the conceptual closeness of the fields of probabilistic record linkage and dual system estimation, and develop methods for the population size estimation, called linkage free dual system estimation, that seamlessly integrate probabilistic record linkage and dual system estimation. Unlike many existing record linkage approaches, the one developed in this thesis is purely estimation-based and does not classify records into links and non-links. It also does not require clerical resolution of possible links. In order to theoretically justify the linkage free dual system estimation method, we revisited certain problematic aspects of probabilistic record linkage and proposed a different approach conceptualizing record linkage models. This conceptualization takes into account a very specific sampling mechanism behind record linkage tasks. It also allows analysis of certain properties and limitations of parameter estimation in linkage models. We also introduce a special case of data blocking that bridges the gap between record linkage data and estimation with these data. Special attention is paid to between-variables associations in the outcomes obtained by comparing the values of linkage variables. We also assess linkage models for identifiability using a variety of methods from the field of algebraic statistics. We demonstrate that in situations where the data in both surveys are collected for the same geographical clusters, the linkage free dual system estimation is feasible and can yield outputs of similar quality to the regular classification approaches that involve clerical interventions. We also develop accompanying variance estimation methods, and these methods rely on less restrictive assumptions than existing methods. All developments are undertaken within the frequentist paradigm.

Text
V_Racinskij_Doctoral_thesis_PDFA - Version of Record
Available under License University of Southampton Thesis Licence.
Download (1MB)
Text
Final-thesis-submission-Examination-Mr-Viktor-Racinskij
Restricted to Repository staff only
Available under License University of Southampton Thesis Licence.

More information

Published date: April 2024
Keywords: dual system estimation, probabilistic record linkage, justification of probabilistic record linkage, identifiability, simulated annealing, Taylor series approximation, within and between linkage variables associations, variance estimation, census and census coverage survey, simulations, no-classification record linkage, linkage free dual system estimation

Identifiers

Local EPrints ID: 489345
URI: http://eprints.soton.ac.uk/id/eprint/489345
PURE UUID: 9d6c0811-7964-48a0-b570-0bc67a52e135
ORCID for Paul Smith: ORCID iD orcid.org/0000-0001-5337-2746
ORCID for Peter Van Der Heijden: ORCID iD orcid.org/0000-0002-3345-096X

Catalogue record

Date deposited: 22 Apr 2024 16:33
Last modified: 23 Apr 2024 01:47

Export record

Contributors

Author: Viktor Racinskij
Thesis advisor: Paul Smith ORCID iD
Thesis advisor: Peter Van Der Heijden ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×