The University of Southampton
University of Southampton Institutional Repository

GridSample: an R package to generate household survey primary sampling units (PSUs) from gridded population data

GridSample: an R package to generate household survey primary sampling units (PSUs) from gridded population data
GridSample: an R package to generate household survey primary sampling units (PSUs) from gridded population data
Background
Household survey data are collected by governments, international organizations, and companies to prioritize policies and allocate billions of dollars. Surveys are typically selected from recent census data; however, census data are often outdated or inaccurate. This paper describes how gridded population data might instead be used as a sample frame, and introduces the R GridSample algorithm for selecting primary sampling units (PSU) for complex household surveys with gridded population data. With a gridded population dataset and geographic boundary of the study area, GridSample allows a two-step process to sample “seed” cells with probability proportionate to estimated population size, then “grows” PSUs until a minimum population is achieved in each PSU. The algorithm permits stratification and oversampling of urban or rural areas. The approximately uniform size and shape of grid cells allows for spatial oversampling, not possible in typical surveys, possibly improving small area estimates with survey results.

Results
We replicated the 2010 Rwanda Demographic and Health Survey (DHS) in GridSample by sampling the WorldPop 2010 UN-adjusted 100 m × 100 m gridded population dataset, stratifying by Rwanda’s 30 districts, and oversampling in urban areas. The 2010 Rwanda DHS had 79 urban PSUs, 413 rural PSUs, with an average PSU population of 610 people. An equivalent sample in GridSample had 75 urban PSUs, 405 rural PSUs, and a median PSU population of 612 people. The number of PSUs differed because DHS added urban PSUs from specific districts while GridSample reallocated rural-to-urban PSUs across all districts.

Conclusions
Gridded population sampling is a promising alternative to typical census-based sampling when census data are moderately outdated or inaccurate. Four approaches to implementation have been tried: (1) using gridded PSU boundaries produced by GridSample, (2) manually segmenting gridded PSU using satellite imagery, (3) non-probability sampling (e.g. random-walk, “spin-the-pen”), and random sampling of households. Gridded population sampling is in its infancy, and further research is needed to assess the accuracy and feasibility of gridded population sampling. The GridSample R algorithm can be used to forward this research agenda.
1476-072X
Thomson, Dana, Renee
c6aa22a0-9ee2-4d86-9bd4-b3a8487eb15b
Stevens, Forrest R.
7c96c2ef-edac-41a1-be26-c4bc5b3256a6
Ruktanonchai, Nick
fe68cb8d-3760-4955-99fa-47d43f86580a
Tatem, Andrew
6c6de104-a5f9-46e0-bb93-a1a7c980513e
Castro, Marcia C.
66b7cfe2-746c-4660-a99a-3ec6628b2d07
Thomson, Dana, Renee
c6aa22a0-9ee2-4d86-9bd4-b3a8487eb15b
Stevens, Forrest R.
7c96c2ef-edac-41a1-be26-c4bc5b3256a6
Ruktanonchai, Nick
fe68cb8d-3760-4955-99fa-47d43f86580a
Tatem, Andrew
6c6de104-a5f9-46e0-bb93-a1a7c980513e
Castro, Marcia C.
66b7cfe2-746c-4660-a99a-3ec6628b2d07

Thomson, Dana, Renee, Stevens, Forrest R., Ruktanonchai, Nick, Tatem, Andrew and Castro, Marcia C. (2017) GridSample: an R package to generate household survey primary sampling units (PSUs) from gridded population data. International Journal of Health Geographics, 16. (doi:10.1186/s12942-017-0098-4).

Record type: Article

Abstract

Background
Household survey data are collected by governments, international organizations, and companies to prioritize policies and allocate billions of dollars. Surveys are typically selected from recent census data; however, census data are often outdated or inaccurate. This paper describes how gridded population data might instead be used as a sample frame, and introduces the R GridSample algorithm for selecting primary sampling units (PSU) for complex household surveys with gridded population data. With a gridded population dataset and geographic boundary of the study area, GridSample allows a two-step process to sample “seed” cells with probability proportionate to estimated population size, then “grows” PSUs until a minimum population is achieved in each PSU. The algorithm permits stratification and oversampling of urban or rural areas. The approximately uniform size and shape of grid cells allows for spatial oversampling, not possible in typical surveys, possibly improving small area estimates with survey results.

Results
We replicated the 2010 Rwanda Demographic and Health Survey (DHS) in GridSample by sampling the WorldPop 2010 UN-adjusted 100 m × 100 m gridded population dataset, stratifying by Rwanda’s 30 districts, and oversampling in urban areas. The 2010 Rwanda DHS had 79 urban PSUs, 413 rural PSUs, with an average PSU population of 610 people. An equivalent sample in GridSample had 75 urban PSUs, 405 rural PSUs, and a median PSU population of 612 people. The number of PSUs differed because DHS added urban PSUs from specific districts while GridSample reallocated rural-to-urban PSUs across all districts.

Conclusions
Gridded population sampling is a promising alternative to typical census-based sampling when census data are moderately outdated or inaccurate. Four approaches to implementation have been tried: (1) using gridded PSU boundaries produced by GridSample, (2) manually segmenting gridded PSU using satellite imagery, (3) non-probability sampling (e.g. random-walk, “spin-the-pen”), and random sampling of households. Gridded population sampling is in its infancy, and further research is needed to assess the accuracy and feasibility of gridded population sampling. The GridSample R algorithm can be used to forward this research agenda.

This record has no associated files available for download.

More information

Accepted/In Press date: 4 July 2017
e-pub ahead of print date: 19 July 2017
Published date: 19 July 2017

Identifiers

Local EPrints ID: 415370
URI: http://eprints.soton.ac.uk/id/eprint/415370
ISSN: 1476-072X
PURE UUID: 94b5da3e-4541-4ac8-bf74-eaa984beaba5
ORCID for Dana, Renee Thomson: ORCID iD orcid.org/0000-0002-9507-9123
ORCID for Andrew Tatem: ORCID iD orcid.org/0000-0002-7270-941X

Catalogue record

Date deposited: 08 Nov 2017 17:30
Last modified: 16 Mar 2024 04:11

Export record

Altmetrics

Contributors

Author: Dana, Renee Thomson ORCID iD
Author: Forrest R. Stevens
Author: Nick Ruktanonchai
Author: Andrew Tatem ORCID iD
Author: Marcia C. Castro

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×