The University of Southampton
University of Southampton Institutional Repository

Linking synthetic populations to household geolocations: a demonstration in Namibia

Linking synthetic populations to household geolocations: a demonstration in Namibia
Linking synthetic populations to household geolocations: a demonstration in Namibia
Whether evaluating gridded population dataset estimates (e.g., WorldPop, LandScan) or household survey sample designs, a population census linked to residential locations are needed. Geolocated census microdata data, however, are almost never available and are thus best simulated. In this paper, we simulate a close-to-reality population of individuals nested in households geolocated to realistic building locations. Using the R simPop package and ArcGIS, multiple realizations of a geolocated synthetic population are derived from the Namibia 2011 census 20% microdata sample, Namibia census enumeration area boundaries, Namibia 2013 Demographic and Health Survey (DHS), and dozens of spatial covariates derived from publicly available datasets. Realistic household latitude-longitude coordinates are manually generated based on public satellite imagery. Simulated households are linked to latitude-longitude coordinates by identifying distinct household types with multivariate k-means analysis and modelling a probability surface for each household type using Random Forest machine learning methods. We simulate five realizations of a synthetic population in Namibia’s Oshikoto region, including demographic, socioeconomic, and outcome characteristics at the level of household, woman, and child. Comparison of variables in the synthetic population were made with 2011 census 20% sample and 2013 DHS data by primary sampling unit/enumeration area. We found that synthetic population variable distributions matched observed observations and followed expected spatial patterns. We outline a novel process to simulate a close-to-reality microdata census geolocated to realistic building locations in a low- or middle-income country setting to support spatial demographic research and survey methodological development while avoiding disclosure risk of individuals.
2306-5729
Thomson-Browne, Dana, Renee
c6aa22a0-9ee2-4d86-9bd4-b3a8487eb15b
Kools, Lieke
75de6974-3b7b-4566-a95c-f370d82949b8
Jochem, Warren
ef65df67-4364-4438-92e9-f93ceedb8da1
Thomson-Browne, Dana, Renee
c6aa22a0-9ee2-4d86-9bd4-b3a8487eb15b
Kools, Lieke
75de6974-3b7b-4566-a95c-f370d82949b8
Jochem, Warren
ef65df67-4364-4438-92e9-f93ceedb8da1

Thomson-Browne, Dana, Renee, Kools, Lieke and Jochem, Warren (2018) Linking synthetic populations to household geolocations: a demonstration in Namibia. Data, 3 (3). (doi:10.3390/data3030030).

Record type: Article

Abstract

Whether evaluating gridded population dataset estimates (e.g., WorldPop, LandScan) or household survey sample designs, a population census linked to residential locations are needed. Geolocated census microdata data, however, are almost never available and are thus best simulated. In this paper, we simulate a close-to-reality population of individuals nested in households geolocated to realistic building locations. Using the R simPop package and ArcGIS, multiple realizations of a geolocated synthetic population are derived from the Namibia 2011 census 20% microdata sample, Namibia census enumeration area boundaries, Namibia 2013 Demographic and Health Survey (DHS), and dozens of spatial covariates derived from publicly available datasets. Realistic household latitude-longitude coordinates are manually generated based on public satellite imagery. Simulated households are linked to latitude-longitude coordinates by identifying distinct household types with multivariate k-means analysis and modelling a probability surface for each household type using Random Forest machine learning methods. We simulate five realizations of a synthetic population in Namibia’s Oshikoto region, including demographic, socioeconomic, and outcome characteristics at the level of household, woman, and child. Comparison of variables in the synthetic population were made with 2011 census 20% sample and 2013 DHS data by primary sampling unit/enumeration area. We found that synthetic population variable distributions matched observed observations and followed expected spatial patterns. We outline a novel process to simulate a close-to-reality microdata census geolocated to realistic building locations in a low- or middle-income country setting to support spatial demographic research and survey methodological development while avoiding disclosure risk of individuals.

Text
data-03-00030 - Version of Record
Available under License Creative Commons Attribution.
Download (3MB)

More information

Submitted date: 18 June 2018
Accepted/In Press date: 7 August 2018
e-pub ahead of print date: 9 August 2018
Published date: 9 August 2018

Identifiers

Local EPrints ID: 424621
URI: https://eprints.soton.ac.uk/id/eprint/424621
ISSN: 2306-5729
PURE UUID: 9df08854-c4a6-4996-ad4b-c3414578c648
ORCID for Dana, Renee Thomson-Browne: ORCID iD orcid.org/0000-0002-9507-9123

Catalogue record

Date deposited: 05 Oct 2018 11:39
Last modified: 05 Nov 2019 01:28

Export record

Altmetrics

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of https://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×