The University of Southampton
University of Southampton Institutional Repository

Zone design for statistical disclosure control in administrative and linked microdata

Zone design for statistical disclosure control in administrative and linked microdata
Zone design for statistical disclosure control in administrative and linked microdata
Objectives: To explore the application of automated zone design tools to protect record-level datasets with attribute detail and a large data volume in a way that might be implemented by a data provider (e.g. National Statistical Organisation/Health Service Provider), initially using a synthetic microdataset. Successful implementation could facilitate the release of rich linked record datasets to researchers so as to preserve small area geographical associations, while not revealing actual locations which are currently lost due to the high level of geographical coding required by data providers prior to release to researchers. Data perturbation is undesirable because of the need for detailed information on certain spatial attributes (e.g. distance to a medical practitioner, exposure to local environment) which has driven demand for new linked administrative datasets, along with provision of suitable research environments. The outcome is a bespoke aggregation of the microdata that meets a set of design constraints but the exact configuration of which is never revealed. Researchers are provided with detailed data and suitable geographies, yet with appropriately reduced disclosure risk.

Approach: Using a synthetic flat file microdataset of individual records with locality-level (MSOA) geography codes for England and Wales (variables: age, gender, economic activity, marital status, occupation, number of hours worked and general health), we synthesize address-level locations within MSOAs using 2011 Census headcount data. These synthetic locations are then associated with a range of spatial measures and indicators such as distance to a medical practitioner. Implementation of the AZTool zone design software enables a bespoke, non-disclosive zone design solution, providing area codes that can be added to the research data without revealing their true locations to the researcher.

Results: Two sets of results will be presented. Firstly, we will explain the spatial characteristics of the new synthetic dataset which we propose may have broader utility. Secondly, we will present results showing changing risk of disclosure and utility when coding to spatial units from different scales and aggregations. Using the synthetic dataset will therefore demonstrate the utility of the approach for a variety of linked and administrative data without any actual disclosure risk.

Conclusions: This approach is applicable to a variety of datasets. The ability to quantify the zone design solution and security in relation to statistical disclosure control will be discussed. Provision of parameters from the zone design process to the data user and the implications of this for security and data users will be considered.
zone design, administrative data, census data, statistical disclosure control.
Robards, James
4c79fa72-e722-4a2a-a289-1d2bad2c2343
Martin, David
e5c52473-e9f0-4f09-b64c-fa32194b162f
Gale, Chris
5e6578ce-b9cf-4173-aad8-4c5cbd6c3696
Robards, James
4c79fa72-e722-4a2a-a289-1d2bad2c2343
Martin, David
e5c52473-e9f0-4f09-b64c-fa32194b162f
Gale, Chris
5e6578ce-b9cf-4173-aad8-4c5cbd6c3696

Robards, James, Martin, David and Gale, Chris (2016) Zone design for statistical disclosure control in administrative and linked microdata. At 2016 International Population Data Linkage Conference 2016 International Population Data Linkage Conference, United Kingdom. 24 - 26 Aug 2016.

Record type: Conference or Workshop Item (Other)

Abstract

Objectives: To explore the application of automated zone design tools to protect record-level datasets with attribute detail and a large data volume in a way that might be implemented by a data provider (e.g. National Statistical Organisation/Health Service Provider), initially using a synthetic microdataset. Successful implementation could facilitate the release of rich linked record datasets to researchers so as to preserve small area geographical associations, while not revealing actual locations which are currently lost due to the high level of geographical coding required by data providers prior to release to researchers. Data perturbation is undesirable because of the need for detailed information on certain spatial attributes (e.g. distance to a medical practitioner, exposure to local environment) which has driven demand for new linked administrative datasets, along with provision of suitable research environments. The outcome is a bespoke aggregation of the microdata that meets a set of design constraints but the exact configuration of which is never revealed. Researchers are provided with detailed data and suitable geographies, yet with appropriately reduced disclosure risk.

Approach: Using a synthetic flat file microdataset of individual records with locality-level (MSOA) geography codes for England and Wales (variables: age, gender, economic activity, marital status, occupation, number of hours worked and general health), we synthesize address-level locations within MSOAs using 2011 Census headcount data. These synthetic locations are then associated with a range of spatial measures and indicators such as distance to a medical practitioner. Implementation of the AZTool zone design software enables a bespoke, non-disclosive zone design solution, providing area codes that can be added to the research data without revealing their true locations to the researcher.

Results: Two sets of results will be presented. Firstly, we will explain the spatial characteristics of the new synthetic dataset which we propose may have broader utility. Secondly, we will present results showing changing risk of disclosure and utility when coding to spatial units from different scales and aggregations. Using the synthetic dataset will therefore demonstrate the utility of the approach for a variety of linked and administrative data without any actual disclosure risk.

Conclusions: This approach is applicable to a variety of datasets. The ability to quantify the zone design solution and security in relation to statistical disclosure control will be discussed. Provision of parameters from the zone design process to the data user and the implications of this for security and data users will be considered.

Full text not available from this repository.

More information

Submitted date: 3 March 2016
e-pub ahead of print date: August 2016
Venue - Dates: 2016 International Population Data Linkage Conference, United Kingdom, 2016-08-24 - 2016-08-26
Keywords: zone design, administrative data, census data, statistical disclosure control.
Organisations: Social Statistics & Demography

Identifiers

Local EPrints ID: 403267
URI: https://eprints.soton.ac.uk/id/eprint/403267
PURE UUID: df7ab28e-4e07-450b-87be-6fd686dcea60
ORCID for James Robards: ORCID iD orcid.org/0000-0003-4784-5679
ORCID for David Martin: ORCID iD orcid.org/0000-0003-0397-0769

Catalogue record

Date deposited: 29 Nov 2016 13:48
Last modified: 06 Jun 2018 13:09

Export record

Contributors

Author: James Robards ORCID iD
Author: David Martin ORCID iD
Author: Chris Gale

University divisions

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of https://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×