Zone design for statistical disclosure control in administrative and linked microdata
Zone design for statistical disclosure control in administrative and linked microdata
The increase in spatially-referenced administrative and linked datasets presents growing challenges for statistical disclosure control. Such new forms of data typically contain both attribute detail and a large data volume, therefore increasing the risk of disclosure of information about individuals and enabling identification. Detailed spatial information may be important to the researcher but also increases risk. This paper is concerned with application of automated zone design tools to protect record-level datasets in a way that might be implemented by a data provider. Implementation could facilitate release of richer data to researchers preserving small area geographical associations, while not revealing actual locations. Using a synthetic microdataset of individual records with locality-level (MSOA) geography codes for England and Wales (variables: age, gender, economic activity, marital status, occupation, number of hours worked and general health), we synthesize address-level locations with reference to 2011 Census headcount data. These synthetic locations are then associated with a range of spatial measures and indicators (e.g. distance to GP). Implementation of the AZTool zone design software enables a bespoke, non-disclosive zone design solution, providing area codes that can be added to the research data without revealing actual locations to the researcher. Results will explain the spatial characteristics of the new synthetic dataset (which may have broader utility) and show changing risk of disclosure and utility when coding to spatial units from different scales and aggregations. Using the synthetic dataset will demonstrate the utility of the approach for a variety of linked and administrative data without any disclosure risk.
Robards, James
4c79fa72-e722-4a2a-a289-1d2bad2c2343
Martin, David
e5c52473-e9f0-4f09-b64c-fa32194b162f
Gale, Chris
5e6578ce-b9cf-4173-aad8-4c5cbd6c3696
Robards, James
4c79fa72-e722-4a2a-a289-1d2bad2c2343
Martin, David
e5c52473-e9f0-4f09-b64c-fa32194b162f
Gale, Chris
5e6578ce-b9cf-4173-aad8-4c5cbd6c3696
Robards, James, Martin, David and Gale, Chris
(2016)
Zone design for statistical disclosure control in administrative and linked microdata.
2016 British Society for Population Studies Conference, Winchester, United Kingdom.
11 - 13 Sep 2016.
Record type:
Conference or Workshop Item
(Other)
Abstract
The increase in spatially-referenced administrative and linked datasets presents growing challenges for statistical disclosure control. Such new forms of data typically contain both attribute detail and a large data volume, therefore increasing the risk of disclosure of information about individuals and enabling identification. Detailed spatial information may be important to the researcher but also increases risk. This paper is concerned with application of automated zone design tools to protect record-level datasets in a way that might be implemented by a data provider. Implementation could facilitate release of richer data to researchers preserving small area geographical associations, while not revealing actual locations. Using a synthetic microdataset of individual records with locality-level (MSOA) geography codes for England and Wales (variables: age, gender, economic activity, marital status, occupation, number of hours worked and general health), we synthesize address-level locations with reference to 2011 Census headcount data. These synthetic locations are then associated with a range of spatial measures and indicators (e.g. distance to GP). Implementation of the AZTool zone design software enables a bespoke, non-disclosive zone design solution, providing area codes that can be added to the research data without revealing actual locations to the researcher. Results will explain the spatial characteristics of the new synthetic dataset (which may have broader utility) and show changing risk of disclosure and utility when coding to spatial units from different scales and aggregations. Using the synthetic dataset will demonstrate the utility of the approach for a variety of linked and administrative data without any disclosure risk.
This record has no associated files available for download.
More information
Submitted date: 11 April 2016
e-pub ahead of print date: 14 September 2016
Venue - Dates:
2016 British Society for Population Studies Conference, Winchester, United Kingdom, 2016-09-11 - 2016-09-13
Organisations:
Social Statistics & Demography
Identifiers
Local EPrints ID: 403266
URI: http://eprints.soton.ac.uk/id/eprint/403266
PURE UUID: b35a13a7-809c-4d3d-8a15-5dda9559b97c
Catalogue record
Date deposited: 29 Nov 2016 13:43
Last modified: 12 Dec 2021 02:46
Export record
Contributors
Author:
James Robards
Author:
Chris Gale
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics