Secure big data collection and processing: framework, means and opportunities

Zhang, Li-Chun and Haraldsen, Gustav (2022) Secure big data collection and processing: framework, means and opportunities. Journal of the Royal Statistical Society: Series A (Statistics in Society), 185 (4), 1541–1559. (doi:10.1111/rssa.12836).

Record type: Article

Abstract

Statistical disclosure control is important for the dissemination of statistical outputs. There is an increasing need for greater confidentiality protection during data collection and processing by National Statistical Offices. In particular, various transactions and remote sensing signals are examples of useful but very detailed big data that can be highly sensitive. Moreover, possible conflicts of interest may arise for data suppliers who operate commercially. In this paper, we formulate statistical disclosure control for data collection and processing as an optimisation problem. Even when it is difficult to specify and solve the problem unequivocally, the formulation can still provide the basis for comparing different disclosure control methods. We develop a general compartmented system that adapts and implements non-perturbative methods in the related fields of linking sensitive data and secure computation. We illustrate how the system can be configured to yield variously required tables and microdata sets with sufficiently low disclosure risks.

Text

secureCollectionProcessing-ZhangHaraldsen-Accepted - Accepted Manuscript

Available under License University of Southampton Accepted Manuscript Licence.

Download (403kB)