The University of Southampton
University of Southampton Institutional Repository

Categorical linkage-data analysis

Categorical linkage-data analysis
Categorical linkage-data analysis
Analysis of integrated data often requires record linkage in order to join together the data residing in separate sources. In case linkage errors cannot be avoided, due to the lack a unique identity key that can be used to link the records unequivocally, standard statistical techniques may produce misleading inference if the linked data are treated as if they were true observations. In this paper, we propose methods for categorical data analysis based on linked data that are not prepared by the analyst, such that neither the match-key variables nor the unlinked records are available. The adjustment is based on the proportion of false links in the linked file and our approach allows the probabilities of correct linkage to vary across the records without requiring that one is able to estimate this probability for each individual record. It accommodates also the general situation where unmatched records that cannot possibly be correctly linked exist in all the sources. The proposed methods are studied by simulation and applied to real data.
0277-6715
Zhang, Li-Chun
a5d48518-7f71-4ed9-bdcb-6585c2da3649
Tuoto, Tiziana
35bc017d-1c9a-42a0-8ff2-9f5b425fdcb2
Zhang, Li-Chun
a5d48518-7f71-4ed9-bdcb-6585c2da3649
Tuoto, Tiziana
35bc017d-1c9a-42a0-8ff2-9f5b425fdcb2

Zhang, Li-Chun and Tuoto, Tiziana (2024) Categorical linkage-data analysis. Statistics in Medicine. (In Press)

Record type: Article

Abstract

Analysis of integrated data often requires record linkage in order to join together the data residing in separate sources. In case linkage errors cannot be avoided, due to the lack a unique identity key that can be used to link the records unequivocally, standard statistical techniques may produce misleading inference if the linked data are treated as if they were true observations. In this paper, we propose methods for categorical data analysis based on linked data that are not prepared by the analyst, such that neither the match-key variables nor the unlinked records are available. The adjustment is based on the proportion of false links in the linked file and our approach allows the probabilities of correct linkage to vary across the records without requiring that one is able to estimate this probability for each individual record. It accommodates also the general situation where unmatched records that cannot possibly be correctly linked exist in all the sources. The proposed methods are studied by simulation and applied to real data.

Text
SIM-24-0017-R1 - Accepted Manuscript
Restricted to Repository staff only until 25 May 2025.
Request a copy

More information

Accepted/In Press date: 25 May 2024

Identifiers

Local EPrints ID: 490706
URI: http://eprints.soton.ac.uk/id/eprint/490706
ISSN: 0277-6715
PURE UUID: 96bfaec4-3fb2-45ec-a5f8-4eba26c5d8ed
ORCID for Li-Chun Zhang: ORCID iD orcid.org/0000-0002-3944-9484

Catalogue record

Date deposited: 04 Jun 2024 16:35
Last modified: 05 Jun 2024 01:45

Export record

Contributors

Author: Li-Chun Zhang ORCID iD
Author: Tiziana Tuoto

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×