The University of Southampton
University of Southampton Institutional Repository

On proxy variables and categorical data fusion

On proxy variables and categorical data fusion
On proxy variables and categorical data fusion
The problem of inference about the joint distribution of two categorical variables based on knowledge or observations of their marginal distributions, to be referred to as categorical data fusion in this paper, is relevant in statistical matching, ecological inference, market research, and several other related fields. This article organizes the use of proxy variables, to be distinguished from other auxiliary variables, both in terms of their effects on the uncertainty of fusion and the techniques of fusion. A measure of the gains of efficiency is provided, which incorporates both the identification uncertainty associated with data fusion and the sampling uncertainty that arises when the theoretical bounds of the uncertainty space are unknown and need to be estimated. Several existing techniques for generating fusion distributions (or datasets) are described and some new ones proposed. Analysis of real-life data demonstrates empirically that proxy variables can make data fusion more precise and the constructed fusion distribution more plausible.
identification problem, sampling uncertainty, uncertainty analysis, fusion distribution, fusion data, proxy variable, relative efficiency
0282-423X
783-807
Zhang, Li-Chun
a5d48518-7f71-4ed9-bdcb-6585c2da3649
Zhang, Li-Chun
a5d48518-7f71-4ed9-bdcb-6585c2da3649

Zhang, Li-Chun (2015) On proxy variables and categorical data fusion. Journal of Official Statistics, 31 (4), 783-807. (doi:10.1515/jos-2015-0045).

Record type: Article

Abstract

The problem of inference about the joint distribution of two categorical variables based on knowledge or observations of their marginal distributions, to be referred to as categorical data fusion in this paper, is relevant in statistical matching, ecological inference, market research, and several other related fields. This article organizes the use of proxy variables, to be distinguished from other auxiliary variables, both in terms of their effects on the uncertainty of fusion and the techniques of fusion. A measure of the gains of efficiency is provided, which incorporates both the identification uncertainty associated with data fusion and the sampling uncertainty that arises when the theoretical bounds of the uncertainty space are unknown and need to be estimated. Several existing techniques for generating fusion distributions (or datasets) are described and some new ones proposed. Analysis of real-life data demonstrates empirically that proxy variables can make data fusion more precise and the constructed fusion distribution more plausible.

Text
jos-2015-0045-published.pdf - Version of Record
Available under License Other.
Download (441kB)

More information

Accepted/In Press date: 1 September 2015
Published date: 16 December 2015
Keywords: identification problem, sampling uncertainty, uncertainty analysis, fusion distribution, fusion data, proxy variable, relative efficiency
Organisations: Social Statistics & Demography

Identifiers

Local EPrints ID: 391010
URI: http://eprints.soton.ac.uk/id/eprint/391010
ISSN: 0282-423X
PURE UUID: 20e560f7-8d1b-4c3e-8563-53debf839695
ORCID for Li-Chun Zhang: ORCID iD orcid.org/0000-0002-3944-9484

Catalogue record

Date deposited: 06 Apr 2016 13:41
Last modified: 15 Mar 2024 03:45

Export record

Altmetrics

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×