Secure big data collection and processing: framework, means and opportunities
Secure big data collection and processing: framework, means and opportunities
Statistical disclosure control is important for the dissemination of statistical outputs. There is an increasing need for greater confidentiality protection during data collection and processing by National Statistical Offices. In particular, various transactions and remote sensing signals are examples of useful but very detailed big data that can be highly sensitive. Moreover, possible conflicts of interest may arise for data suppliers who operate commercially. In this paper, we formulate statistical disclosure control for data collection and processing as an optimisation problem. Even when it is difficult to specify and solve the problem unequivocally, the formulation can still provide the basis for comparing different disclosure control methods. We develop a general compartmented system that adapts and implements non-perturbative methods in the related fields of linking sensitive data and secure computation. We illustrate how the system can be configured to yield variously required tables and microdata sets with sufficiently low disclosure risks.
Non-survey big data, statistical disclosure control, confidentiality protection, trusted execution environment
1541–1559
Zhang, Li-Chun
a5d48518-7f71-4ed9-bdcb-6585c2da3649
Haraldsen, Gustav
bde26eec-8298-4ba1-952e-42445239763a
1 October 2022
Zhang, Li-Chun
a5d48518-7f71-4ed9-bdcb-6585c2da3649
Haraldsen, Gustav
bde26eec-8298-4ba1-952e-42445239763a
Zhang, Li-Chun and Haraldsen, Gustav
(2022)
Secure big data collection and processing: framework, means and opportunities.
Journal of the Royal Statistical Society: Series A (Statistics in Society), 185 (4), .
(doi:10.1111/rssa.12836).
Abstract
Statistical disclosure control is important for the dissemination of statistical outputs. There is an increasing need for greater confidentiality protection during data collection and processing by National Statistical Offices. In particular, various transactions and remote sensing signals are examples of useful but very detailed big data that can be highly sensitive. Moreover, possible conflicts of interest may arise for data suppliers who operate commercially. In this paper, we formulate statistical disclosure control for data collection and processing as an optimisation problem. Even when it is difficult to specify and solve the problem unequivocally, the formulation can still provide the basis for comparing different disclosure control methods. We develop a general compartmented system that adapts and implements non-perturbative methods in the related fields of linking sensitive data and secure computation. We illustrate how the system can be configured to yield variously required tables and microdata sets with sufficiently low disclosure risks.
Text
secureCollectionProcessing-ZhangHaraldsen-Accepted
- Accepted Manuscript
More information
Accepted/In Press date: 14 February 2022
e-pub ahead of print date: 25 March 2022
Published date: 1 October 2022
Keywords:
Non-survey big data, statistical disclosure control, confidentiality protection, trusted execution environment
Identifiers
Local EPrints ID: 454993
URI: http://eprints.soton.ac.uk/id/eprint/454993
ISSN: 0964-1998
PURE UUID: ee70669c-61b9-4ddc-b659-d6f6be729846
Catalogue record
Date deposited: 03 Mar 2022 17:37
Last modified: 17 Mar 2024 07:08
Export record
Altmetrics
Contributors
Author:
Gustav Haraldsen
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics