Skinner, Chris and Shlomo, Natalie
Assessing identification risk in survey microdata using log-linear models. Southampton, UK, University of Southampton, Southampton Statistical Sciences Research Institute, 36pp.
(S3RI Methodology Working Papers, (M06/14) ).
This article considers the assessment of the risk of identification of respondents in survey microdata, in the context of applications at the United Kingdom (UK) Office for National Statistics (ONS). The threat comes from the matching of categorical 'key' variables between microdata records and external data sources and from the use of log-linear models to facilitate matching. While the potential use of such statistical models is well-established in the literature, little consideration has been given to model specification nor to the sensitivity of risk assessment to this specification. In this article we develop new criteria for assessing the specification of a log-linear model in relation to the accuracy of risk estimates. We find that, within a class of 'reasonable' models, risk estimates tend to decrease as the complexity of the model increases. We develop criteria to detect 'underfitting' (associated with overestimation of the risk). The criteria may also reveal 'overfitting' (associated with underestimation) although not so clearly, so we suggest employing a forward model selection approach. We show how our approach may be used for both file-level and record-level measures of risk. We evaluate the proposed procedures using samples drawn from the 2001 UK Census where the true risks can be determined. We also apply our approach to a large survey dataset.
Actions (login required)