The University of Southampton
University of Southampton Institutional Repository

Statistical estimation and inference with aggregated and displaced georeferenced data

Statistical estimation and inference with aggregated and displaced georeferenced data
Statistical estimation and inference with aggregated and displaced georeferenced data
The thesis addresses the problem of statistical inference for data affected by geocoordinate random displacement or a combination of aggregation and random displacement, which is often used to preserve respondents' confidentiality. However, the distortion induced in the location of the observations may compromise the validity of location-dependent estimates. This thesis explores various situations where such trade-offs may arise, including: 1) population density estimation, 2) estimation of health and demographic indicators for lower geographical domains, 3) regression analyses involving lower geographical domains (e.g., within the context of multilevel models), and 4) regression analyses incorporating spatial covariates calculated or linked from external data sources based on geocoordinates. A measurement error model (MEM) is developed for the case of random displacement or aggregation. It is demonstrated under the Demographic and Health Survey (DHS) random displacement process by devising a new probability distribution for the displaced coordinates. Two methods, Kernel Density Estimation-based (KDE) and External Data-based Classification (EDC), are proposed within the MEM framework to approximate the conditional distribution of the true coordinates given the ones subject to displacement. Additionally, a novel method, KDE-ED, that combines KDE and EDC is proposed to address both aggregation and random displacement issues in approximating the conditional distribution of true coordinates. The KDE method uses kernel density estimates for approximating the unknown marginal distribution of true location coordinates and is implemented using the Stochastic Expectation-Maximization (SEM) algorithm. The EDC approximates the marginal distribution of true coordinates using external data sources and implements estimation through numerical integration. The MEM and the two proposed KDE and EDC methods are used to address all four location-dependent statistical estimation issues mentioned. The KDE and EDC can be directly used to estimate population densities or domain parameter estimates accounting for random displacement or both aggregation and displacement errors. Apart from the EDC (or KDE)-based algorithm, a new method incorporating a parametric Bootstrap Bias correction (BC) is proposed to obtain improved estimates of the parameters in the linear mixed model, correcting misplacement error due to random displacement. Furthermore, the EDC (or KDE) can be used under regression calibration (EDC-RC) to improve the estimation of spatial covariate effects in a linear regression model under random displacement. An alternative estimator using only a non-parametric Bootstrap Bias correction over the usual OLS estimators is also proposed for the latter situation. The performance of all estimators developed, as well as the variance estimators proposed for them, is assessed via simulation exercises and illustrated using real data from the 2011 Bangladesh DHS.
University of Southampton
Hossain, Md Jamal
81a74d5a-1e0e-431d-808e-09a576a2ba01
Hossain, Md Jamal
81a74d5a-1e0e-431d-808e-09a576a2ba01
Tzavidis, Nikos
431ec55d-c147-466d-9c65-0f377b0c1f6a
Luna Hernandez, Angela
b4de50ed-b80a-4202-aaad-c97d057369ed
Zhang, Li-Chun
a5d48518-7f71-4ed9-bdcb-6585c2da3649

Hossain, Md Jamal (2023) Statistical estimation and inference with aggregated and displaced georeferenced data. University of Southampton, Doctoral Thesis, 159pp.

Record type: Thesis (Doctoral)

Abstract

The thesis addresses the problem of statistical inference for data affected by geocoordinate random displacement or a combination of aggregation and random displacement, which is often used to preserve respondents' confidentiality. However, the distortion induced in the location of the observations may compromise the validity of location-dependent estimates. This thesis explores various situations where such trade-offs may arise, including: 1) population density estimation, 2) estimation of health and demographic indicators for lower geographical domains, 3) regression analyses involving lower geographical domains (e.g., within the context of multilevel models), and 4) regression analyses incorporating spatial covariates calculated or linked from external data sources based on geocoordinates. A measurement error model (MEM) is developed for the case of random displacement or aggregation. It is demonstrated under the Demographic and Health Survey (DHS) random displacement process by devising a new probability distribution for the displaced coordinates. Two methods, Kernel Density Estimation-based (KDE) and External Data-based Classification (EDC), are proposed within the MEM framework to approximate the conditional distribution of the true coordinates given the ones subject to displacement. Additionally, a novel method, KDE-ED, that combines KDE and EDC is proposed to address both aggregation and random displacement issues in approximating the conditional distribution of true coordinates. The KDE method uses kernel density estimates for approximating the unknown marginal distribution of true location coordinates and is implemented using the Stochastic Expectation-Maximization (SEM) algorithm. The EDC approximates the marginal distribution of true coordinates using external data sources and implements estimation through numerical integration. The MEM and the two proposed KDE and EDC methods are used to address all four location-dependent statistical estimation issues mentioned. The KDE and EDC can be directly used to estimate population densities or domain parameter estimates accounting for random displacement or both aggregation and displacement errors. Apart from the EDC (or KDE)-based algorithm, a new method incorporating a parametric Bootstrap Bias correction (BC) is proposed to obtain improved estimates of the parameters in the linear mixed model, correcting misplacement error due to random displacement. Furthermore, the EDC (or KDE) can be used under regression calibration (EDC-RC) to improve the estimation of spatial covariate effects in a linear regression model under random displacement. An alternative estimator using only a non-parametric Bootstrap Bias correction over the usual OLS estimators is also proposed for the latter situation. The performance of all estimators developed, as well as the variance estimators proposed for them, is assessed via simulation exercises and illustrated using real data from the 2011 Bangladesh DHS.

Text
PhD-Thesis-Jamal-Hossain-PDF-A - Version of Record
Available under License University of Southampton Thesis Licence.
Download (4MB)
Text
Final-thesis-submission-Examination-Mr-Md-Hossain
Restricted to Repository staff only
Available under License University of Southampton Thesis Licence.

More information

Published date: November 2023

Identifiers

Local EPrints ID: 484015
URI: http://eprints.soton.ac.uk/id/eprint/484015
PURE UUID: 979346ca-5e70-450c-8f0e-1c2d18e0118e
ORCID for Md Jamal Hossain: ORCID iD orcid.org/0000-0002-2728-1055
ORCID for Nikos Tzavidis: ORCID iD orcid.org/0000-0002-8413-8095
ORCID for Angela Luna Hernandez: ORCID iD orcid.org/0000-0001-8629-1918
ORCID for Li-Chun Zhang: ORCID iD orcid.org/0000-0002-3944-9484

Catalogue record

Date deposited: 09 Nov 2023 17:30
Last modified: 18 Mar 2024 03:28

Export record

Contributors

Author: Md Jamal Hossain ORCID iD
Thesis advisor: Nikos Tzavidis ORCID iD
Thesis advisor: Angela Luna Hernandez ORCID iD
Thesis advisor: Li-Chun Zhang ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×