Correlates of record linkage and estimating risks of non-linkage biases in business datasets
Correlates of record linkage and estimating risks of non-linkage biases in business datasets
Researchers often utilise datasets that link information from multiple sources, but non-linkage biases caused by linked and non-linked subject differences are little understood, especially in business datasets. We address these knowledge gaps by studying biases in linkable 2010 UK Small Business Survey datasets. We identify correlates of business linkage propensity, and also for the first time its components: consent to linkage and register identifier appendability. As well, we take a novel approach to evaluating non-linkage bias risks, by computing dataset representativeness indicators (comparable, decomposable sample-subset similarity measures). We find that the main impacts on linkage propensities and bias risks are due to consenter / non-consenter differences explicable given business survey response processes, and differences between subjects with and without identifiers caused by register under-coverage of very small businesses. We then discuss consequences for the analysis of linked business datasets, and implications of the evaluation methods we introduce for linked dataset producers and users.
Moore, Jamie
5f015c47-3165-4f64-8561-7c047a9d2186
Durrant, Gabriele
14fcc787-2666-46f2-a097-e4b98a210610
Smith, Peter W F
961a01a3-bf4c-43ca-9599-5be4fd5d3940
Moore, Jamie
5f015c47-3165-4f64-8561-7c047a9d2186
Durrant, Gabriele
14fcc787-2666-46f2-a097-e4b98a210610
Smith, Peter W F
961a01a3-bf4c-43ca-9599-5be4fd5d3940
Moore, Jamie, Durrant, Gabriele and Smith, Peter W F
(2017)
Correlates of record linkage and estimating risks of non-linkage biases in business datasets.
Journal of the Royal Statistical Society: Series A (Statistics in Society).
(doi:10.1111/rssa.12342).
Abstract
Researchers often utilise datasets that link information from multiple sources, but non-linkage biases caused by linked and non-linked subject differences are little understood, especially in business datasets. We address these knowledge gaps by studying biases in linkable 2010 UK Small Business Survey datasets. We identify correlates of business linkage propensity, and also for the first time its components: consent to linkage and register identifier appendability. As well, we take a novel approach to evaluating non-linkage bias risks, by computing dataset representativeness indicators (comparable, decomposable sample-subset similarity measures). We find that the main impacts on linkage propensities and bias risks are due to consenter / non-consenter differences explicable given business survey response processes, and differences between subjects with and without identifiers caused by register under-coverage of very small businesses. We then discuss consequences for the analysis of linked business datasets, and implications of the evaluation methods we introduce for linked dataset producers and users.
Text
Final consent paper_upd1
Restricted to Repository staff only
Request a copy
More information
Accepted/In Press date: 30 October 2017
e-pub ahead of print date: 6 December 2017
Identifiers
Local EPrints ID: 415232
URI: http://eprints.soton.ac.uk/id/eprint/415232
ISSN: 0964-1998
PURE UUID: 7c8bf6cd-ffa5-4fba-b1fb-2ebabd370fdd
Catalogue record
Date deposited: 03 Nov 2017 17:30
Last modified: 18 May 2024 01:36
Export record
Altmetrics
Contributors
Author:
Jamie Moore
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics