The University of Southampton
University of Southampton Institutional Repository

An overview of population size estimation where linking registers results in incomplete covariates, with an application to mode of transport of serious road casualties

An overview of population size estimation where linking registers results in incomplete covariates, with an application to mode of transport of serious road casualties
An overview of population size estimation where linking registers results in incomplete covariates, with an application to mode of transport of serious road casualties
We consider the linkage of two or more registers in the situation where the registers do not cover the whole target population, and relevant categorical auxiliary variables (unique to one of the registers; although different variables could be present on each register) are available in addition to the usual matching variable(s). The linked registers therefore do not contain full information on either the observations (often individuals) or the variables. By treating this as a missing data problem it is possible to construct a linked data set, adjusted to estimate the part of the population missed by both registers, and containing completed covariate information for all the registers. This is achieved using an Expectation-Maximization (EM)-algorithm. We elucidate the properties of this approach where the model is appropriate and in situations corresponding with real applications in official statistics, and also where the model conditions are violated. The approach is applied to data on road accidents in the Netherlands, where the cause of the accident is denoted by the police and by the hospital. Here the cause of the accident denoted by the police is considered as missing information for the statistical units only registered by the hospital, and the other way around. The method needs to be widely applied to give a better impression of the range of problems where it can be beneficial.
0282-423X
239-263
Van Der Heijden, Peter
85157917-3b33-4683-81be-713f987fd612
Smith, Paul
a2548525-4f99-4baf-a4d0-2b216cce059c
Cruyff, Maarten
68bcfa19-3d85-4b0f-a6a4-6e148b265f19
Bakker, Bart
75cc130a-157a-4b06-a5ea-92a6457d806f
Van Der Heijden, Peter
85157917-3b33-4683-81be-713f987fd612
Smith, Paul
a2548525-4f99-4baf-a4d0-2b216cce059c
Cruyff, Maarten
68bcfa19-3d85-4b0f-a6a4-6e148b265f19
Bakker, Bart
75cc130a-157a-4b06-a5ea-92a6457d806f

Van Der Heijden, Peter, Smith, Paul, Cruyff, Maarten and Bakker, Bart (2018) An overview of population size estimation where linking registers results in incomplete covariates, with an application to mode of transport of serious road casualties. Journal of Official Statistics, 34 (1), 239-263. (doi:10.1515/jos-2018-0011).

Record type: Article

Abstract

We consider the linkage of two or more registers in the situation where the registers do not cover the whole target population, and relevant categorical auxiliary variables (unique to one of the registers; although different variables could be present on each register) are available in addition to the usual matching variable(s). The linked registers therefore do not contain full information on either the observations (often individuals) or the variables. By treating this as a missing data problem it is possible to construct a linked data set, adjusted to estimate the part of the population missed by both registers, and containing completed covariate information for all the registers. This is achieved using an Expectation-Maximization (EM)-algorithm. We elucidate the properties of this approach where the model is appropriate and in situations corresponding with real applications in official statistics, and also where the model conditions are violated. The approach is applied to data on road accidents in the Netherlands, where the cause of the accident is denoted by the police and by the hospital. Here the cause of the accident denoted by the police is considered as missing information for the statistical units only registered by the hospital, and the other way around. The method needs to be widely applied to give a better impression of the range of problems where it can be beneficial.

Text
traffic 20170721 JOS Final Clean - Manuscript - Author's Original
Restricted to Repository staff only
Request a copy
Text
Van der Heijden et al JOS accepted 2017 - Accepted Manuscript
Download (555kB)
Text
An Overview of Population Size Estimation where Linking Registers Results in Incomplete Covariates, with an Application to Mode of Transport of Serious Road Casualties - Version of Record
Download (324kB)

More information

Accepted/In Press date: 1 September 2017
e-pub ahead of print date: 1 March 2018
Published date: 1 March 2018

Identifiers

Local EPrints ID: 414761
URI: http://eprints.soton.ac.uk/id/eprint/414761
ISSN: 0282-423X
PURE UUID: 29c1528e-5bfe-4fdc-9960-8550f79a3c6c
ORCID for Peter Van Der Heijden: ORCID iD orcid.org/0000-0002-3345-096X
ORCID for Paul Smith: ORCID iD orcid.org/0000-0001-5337-2746

Catalogue record

Date deposited: 10 Oct 2017 16:31
Last modified: 16 Apr 2024 04:01

Export record

Altmetrics

Contributors

Author: Paul Smith ORCID iD
Author: Maarten Cruyff
Author: Bart Bakker

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×