Statistical estimation and inference with aggregated and displaced georeferenced data
Statistical estimation and inference with aggregated and displaced georeferenced data
The thesis addresses the problem of statistical inference for data affected by geocoordinate random displacement or a combination of aggregation and random displacement, which is often used to preserve respondents' confidentiality. However, the distortion induced in the location of the observations may compromise the validity of location-dependent estimates. This thesis explores various situations where such trade-offs may arise, including: 1) population density estimation, 2) estimation of health and demographic indicators for lower geographical domains, 3) regression analyses involving lower geographical domains (e.g., within the context of multilevel models), and 4) regression analyses incorporating spatial covariates calculated or linked from external data sources based on geocoordinates. A measurement error model (MEM) is developed for the case of random displacement or aggregation. It is demonstrated under the Demographic and Health Survey (DHS) random displacement process by devising a new probability distribution for the displaced coordinates. Two methods, Kernel Density Estimation-based (KDE) and External Data-based Classification (EDC), are proposed within the MEM framework to approximate the conditional distribution of the true coordinates given the ones subject to displacement. Additionally, a novel method, KDE-ED, that combines KDE and EDC is proposed to address both aggregation and random displacement issues in approximating the conditional distribution of true coordinates. The KDE method uses kernel density estimates for approximating the unknown marginal distribution of true location coordinates and is implemented using the Stochastic Expectation-Maximization (SEM) algorithm. The EDC approximates the marginal distribution of true coordinates using external data sources and implements estimation through numerical integration. The MEM and the two proposed KDE and EDC methods are used to address all four location-dependent statistical estimation issues mentioned. The KDE and EDC can be directly used to estimate population densities or domain parameter estimates accounting for random displacement or both aggregation and displacement errors. Apart from the EDC (or KDE)-based algorithm, a new method incorporating a parametric Bootstrap Bias correction (BC) is proposed to obtain improved estimates of the parameters in the linear mixed model, correcting misplacement error due to random displacement. Furthermore, the EDC (or KDE) can be used under regression calibration (EDC-RC) to improve the estimation of spatial covariate effects in a linear regression model under random displacement. An alternative estimator using only a non-parametric Bootstrap Bias correction over the usual OLS estimators is also proposed for the latter situation. The performance of all estimators developed, as well as the variance estimators proposed for them, is assessed via simulation exercises and illustrated using real data from the 2011 Bangladesh DHS.
University of Southampton
Hossain, Jamal
3b4f5a47-c0a3-407b-88c0-ec936e70faf3
November 2023
Hossain, Jamal
3b4f5a47-c0a3-407b-88c0-ec936e70faf3
Tzavidis, Nikos
431ec55d-c147-466d-9c65-0f377b0c1f6a
Luna Hernandez, Angela
b4de50ed-b80a-4202-aaad-c97d057369ed
Zhang, Li-Chun
a5d48518-7f71-4ed9-bdcb-6585c2da3649
Hossain, Jamal
(2023)
Statistical estimation and inference with aggregated and displaced georeferenced data.
University of Southampton, Doctoral Thesis, 159pp.
Record type:
Thesis
(Doctoral)
Abstract
The thesis addresses the problem of statistical inference for data affected by geocoordinate random displacement or a combination of aggregation and random displacement, which is often used to preserve respondents' confidentiality. However, the distortion induced in the location of the observations may compromise the validity of location-dependent estimates. This thesis explores various situations where such trade-offs may arise, including: 1) population density estimation, 2) estimation of health and demographic indicators for lower geographical domains, 3) regression analyses involving lower geographical domains (e.g., within the context of multilevel models), and 4) regression analyses incorporating spatial covariates calculated or linked from external data sources based on geocoordinates. A measurement error model (MEM) is developed for the case of random displacement or aggregation. It is demonstrated under the Demographic and Health Survey (DHS) random displacement process by devising a new probability distribution for the displaced coordinates. Two methods, Kernel Density Estimation-based (KDE) and External Data-based Classification (EDC), are proposed within the MEM framework to approximate the conditional distribution of the true coordinates given the ones subject to displacement. Additionally, a novel method, KDE-ED, that combines KDE and EDC is proposed to address both aggregation and random displacement issues in approximating the conditional distribution of true coordinates. The KDE method uses kernel density estimates for approximating the unknown marginal distribution of true location coordinates and is implemented using the Stochastic Expectation-Maximization (SEM) algorithm. The EDC approximates the marginal distribution of true coordinates using external data sources and implements estimation through numerical integration. The MEM and the two proposed KDE and EDC methods are used to address all four location-dependent statistical estimation issues mentioned. The KDE and EDC can be directly used to estimate population densities or domain parameter estimates accounting for random displacement or both aggregation and displacement errors. Apart from the EDC (or KDE)-based algorithm, a new method incorporating a parametric Bootstrap Bias correction (BC) is proposed to obtain improved estimates of the parameters in the linear mixed model, correcting misplacement error due to random displacement. Furthermore, the EDC (or KDE) can be used under regression calibration (EDC-RC) to improve the estimation of spatial covariate effects in a linear regression model under random displacement. An alternative estimator using only a non-parametric Bootstrap Bias correction over the usual OLS estimators is also proposed for the latter situation. The performance of all estimators developed, as well as the variance estimators proposed for them, is assessed via simulation exercises and illustrated using real data from the 2011 Bangladesh DHS.
Text
PhD-Thesis-Jamal-Hossain-PDF-A
- Version of Record
Text
Final-thesis-submission-Examination-Mr-Md-Hossain
Restricted to Repository staff only
More information
Published date: November 2023
Identifiers
Local EPrints ID: 484015
URI: http://eprints.soton.ac.uk/id/eprint/484015
PURE UUID: 979346ca-5e70-450c-8f0e-1c2d18e0118e
Catalogue record
Date deposited: 09 Nov 2023 17:30
Last modified: 11 Jun 2024 01:46
Export record
Contributors
Author:
Jamal Hossain
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics