Protection of micro-data subject to edit constraints against statistical disclosure
Protection of micro-data subject to edit constraints against statistical disclosure
Before releasing statistical outputs, data suppliers have to assess if the privacy of statistical units is endangered and apply Statistical Disclosure Control (SDC) methods if necessary. SDC methods perturb, modify or summarize the data, depending on the format for releasing the data, whether as micro-data or tabular data. The goal is to choose an optimal method that manages disclosure risk while ensuring high-quality statistical data. In this article we discuss the effect of applying basic SDC methods on continuous and categorical variables for data masking. Perturbative SDC methods alter the data in some way. Changing values, however, will likely distort totals and other sufficient statistics and also cause fully edited records in micro-data to fail edit constraints, resulting in low-quality data. Moreover, an inconsistent record will signal that the record has been perturbed for disclosure control and attempts can be made to unmask the data. In order to deal with these problems, we develop new strategies for implementing basic perturbation methods that are often implemented at Statistical Agencies which minimize record level edit failures as well as overall measures of information loss.
information loss, additive noise, micro-aggregation, post-randomization method, rank swapping, rounding
229-253
Shlomo, Natalie
e749febc-b7b9-4017-be48-96d59dd03215
De Waal, Ton
7d2c05de-fece-476c-bbf7-685f3c4b5221
2008
Shlomo, Natalie
e749febc-b7b9-4017-be48-96d59dd03215
De Waal, Ton
7d2c05de-fece-476c-bbf7-685f3c4b5221
Shlomo, Natalie and De Waal, Ton
(2008)
Protection of micro-data subject to edit constraints against statistical disclosure.
Journal of Official Statistics, 24 (2), .
Abstract
Before releasing statistical outputs, data suppliers have to assess if the privacy of statistical units is endangered and apply Statistical Disclosure Control (SDC) methods if necessary. SDC methods perturb, modify or summarize the data, depending on the format for releasing the data, whether as micro-data or tabular data. The goal is to choose an optimal method that manages disclosure risk while ensuring high-quality statistical data. In this article we discuss the effect of applying basic SDC methods on continuous and categorical variables for data masking. Perturbative SDC methods alter the data in some way. Changing values, however, will likely distort totals and other sufficient statistics and also cause fully edited records in micro-data to fail edit constraints, resulting in low-quality data. Moreover, an inconsistent record will signal that the record has been perturbed for disclosure control and attempts can be made to unmask the data. In order to deal with these problems, we develop new strategies for implementing basic perturbation methods that are often implemented at Statistical Agencies which minimize record level edit failures as well as overall measures of information loss.
This record has no associated files available for download.
More information
Published date: 2008
Keywords:
information loss, additive noise, micro-aggregation, post-randomization method, rank swapping, rounding
Identifiers
Local EPrints ID: 51967
URI: http://eprints.soton.ac.uk/id/eprint/51967
ISSN: 0282-423X
PURE UUID: 2b086ded-5e3d-441e-af4e-653e2fdf2931
Catalogue record
Date deposited: 07 Jul 2008
Last modified: 11 Dec 2021 17:11
Export record
Contributors
Author:
Natalie Shlomo
Author:
Ton De Waal
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics