Proxy expenditure weights for Consumer Price Index: audit sampling inference for big-data statistics
Proxy expenditure weights for Consumer Price Index: audit sampling inference for big-data statistics
Purchase data from retail chains can provide proxy measures of private household expenditure on items that are the most troublesome to collect in the traditional expenditure survey. Due to the inevitable coverage and selection errors, bias must exist in these proxy measures. Moreover, given the sheer amount of data, the bias completely dominates the variance. To investigate the potential of replacing costly and burdensome surveys by non-survey big-data sources, we propose an audit sampling inference approach, which does not require linking the audit sample and the big-data source at the individual level. It turns out that one is unable to reject a null hypothesis of unbiased big-data estimation at the chosen size, because the audit sampling variance is too large compared to the bias of the big-data estimate. For the same reason, audit sampling fails to yield a meaningful mean squared error estimate. We propose a novel accuracy measure that is generally applicable in such situations. This can provide a necessary part of the statistical argument for the uptake of non-survey big-data sources, in replacement of traditional survey sampling. An application to disaggregated food price indices is used to demonstrate the proposed approach.
evaluation coverage, privacy protection, proxy source effect, survey burden and cost
Zhang, Li-Chun
a5d48518-7f71-4ed9-bdcb-6585c2da3649
Zhang, Li-Chun
a5d48518-7f71-4ed9-bdcb-6585c2da3649
Zhang, Li-Chun
(2020)
Proxy expenditure weights for Consumer Price Index: audit sampling inference for big-data statistics.
Journal of the Royal Statistical Society: Series A (Statistics in Society), 0, [rssa.12632].
(doi:10.1111/rssa.12632).
Abstract
Purchase data from retail chains can provide proxy measures of private household expenditure on items that are the most troublesome to collect in the traditional expenditure survey. Due to the inevitable coverage and selection errors, bias must exist in these proxy measures. Moreover, given the sheer amount of data, the bias completely dominates the variance. To investigate the potential of replacing costly and burdensome surveys by non-survey big-data sources, we propose an audit sampling inference approach, which does not require linking the audit sample and the big-data source at the individual level. It turns out that one is unable to reject a null hypothesis of unbiased big-data estimation at the chosen size, because the audit sampling variance is too large compared to the bias of the big-data estimate. For the same reason, audit sampling fails to yield a meaningful mean squared error estimate. We propose a novel accuracy measure that is generally applicable in such situations. This can provide a necessary part of the statistical argument for the uptake of non-survey big-data sources, in replacement of traditional survey sampling. An application to disaggregated food price indices is used to demonstrate the proposed approach.
Text
proxy CPI weights r1 (1)
- Accepted Manuscript
More information
Accepted/In Press date: 20 June 2020
e-pub ahead of print date: 25 November 2020
Keywords:
evaluation coverage, privacy protection, proxy source effect, survey burden and cost
Identifiers
Local EPrints ID: 442272
URI: http://eprints.soton.ac.uk/id/eprint/442272
ISSN: 0964-1998
PURE UUID: feb980cd-c753-4b59-b768-69c22eeec488
Catalogue record
Date deposited: 10 Jul 2020 16:31
Last modified: 17 Mar 2024 05:42
Export record
Altmetrics
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics