GridSample: an R package to generate household survey primary sampling units (PSUs) from gridded population data
GridSample: an R package to generate household survey primary sampling units (PSUs) from gridded population data
Background
Household survey data are collected by governments, international organizations, and companies to prioritize policies and allocate billions of dollars. Surveys are typically selected from recent census data; however, census data are often outdated or inaccurate. This paper describes how gridded population data might instead be used as a sample frame, and introduces the R GridSample algorithm for selecting primary sampling units (PSU) for complex household surveys with gridded population data. With a gridded population dataset and geographic boundary of the study area, GridSample allows a two-step process to sample “seed” cells with probability proportionate to estimated population size, then “grows” PSUs until a minimum population is achieved in each PSU. The algorithm permits stratification and oversampling of urban or rural areas. The approximately uniform size and shape of grid cells allows for spatial oversampling, not possible in typical surveys, possibly improving small area estimates with survey results.
Results
We replicated the 2010 Rwanda Demographic and Health Survey (DHS) in GridSample by sampling the WorldPop 2010 UN-adjusted 100 m × 100 m gridded population dataset, stratifying by Rwanda’s 30 districts, and oversampling in urban areas. The 2010 Rwanda DHS had 79 urban PSUs, 413 rural PSUs, with an average PSU population of 610 people. An equivalent sample in GridSample had 75 urban PSUs, 405 rural PSUs, and a median PSU population of 612 people. The number of PSUs differed because DHS added urban PSUs from specific districts while GridSample reallocated rural-to-urban PSUs across all districts.
Conclusions
Gridded population sampling is a promising alternative to typical census-based sampling when census data are moderately outdated or inaccurate. Four approaches to implementation have been tried: (1) using gridded PSU boundaries produced by GridSample, (2) manually segmenting gridded PSU using satellite imagery, (3) non-probability sampling (e.g. random-walk, “spin-the-pen”), and random sampling of households. Gridded population sampling is in its infancy, and further research is needed to assess the accuracy and feasibility of gridded population sampling. The GridSample R algorithm can be used to forward this research agenda.
Thomson, Dana, Renee
c6aa22a0-9ee2-4d86-9bd4-b3a8487eb15b
Stevens, Forrest R.
7c96c2ef-edac-41a1-be26-c4bc5b3256a6
Ruktanonchai, Nick
fe68cb8d-3760-4955-99fa-47d43f86580a
Tatem, Andrew
6c6de104-a5f9-46e0-bb93-a1a7c980513e
Castro, Marcia C.
66b7cfe2-746c-4660-a99a-3ec6628b2d07
19 July 2017
Thomson, Dana, Renee
c6aa22a0-9ee2-4d86-9bd4-b3a8487eb15b
Stevens, Forrest R.
7c96c2ef-edac-41a1-be26-c4bc5b3256a6
Ruktanonchai, Nick
fe68cb8d-3760-4955-99fa-47d43f86580a
Tatem, Andrew
6c6de104-a5f9-46e0-bb93-a1a7c980513e
Castro, Marcia C.
66b7cfe2-746c-4660-a99a-3ec6628b2d07
Thomson, Dana, Renee, Stevens, Forrest R., Ruktanonchai, Nick, Tatem, Andrew and Castro, Marcia C.
(2017)
GridSample: an R package to generate household survey primary sampling units (PSUs) from gridded population data.
International Journal of Health Geographics, 16.
(doi:10.1186/s12942-017-0098-4).
Abstract
Background
Household survey data are collected by governments, international organizations, and companies to prioritize policies and allocate billions of dollars. Surveys are typically selected from recent census data; however, census data are often outdated or inaccurate. This paper describes how gridded population data might instead be used as a sample frame, and introduces the R GridSample algorithm for selecting primary sampling units (PSU) for complex household surveys with gridded population data. With a gridded population dataset and geographic boundary of the study area, GridSample allows a two-step process to sample “seed” cells with probability proportionate to estimated population size, then “grows” PSUs until a minimum population is achieved in each PSU. The algorithm permits stratification and oversampling of urban or rural areas. The approximately uniform size and shape of grid cells allows for spatial oversampling, not possible in typical surveys, possibly improving small area estimates with survey results.
Results
We replicated the 2010 Rwanda Demographic and Health Survey (DHS) in GridSample by sampling the WorldPop 2010 UN-adjusted 100 m × 100 m gridded population dataset, stratifying by Rwanda’s 30 districts, and oversampling in urban areas. The 2010 Rwanda DHS had 79 urban PSUs, 413 rural PSUs, with an average PSU population of 610 people. An equivalent sample in GridSample had 75 urban PSUs, 405 rural PSUs, and a median PSU population of 612 people. The number of PSUs differed because DHS added urban PSUs from specific districts while GridSample reallocated rural-to-urban PSUs across all districts.
Conclusions
Gridded population sampling is a promising alternative to typical census-based sampling when census data are moderately outdated or inaccurate. Four approaches to implementation have been tried: (1) using gridded PSU boundaries produced by GridSample, (2) manually segmenting gridded PSU using satellite imagery, (3) non-probability sampling (e.g. random-walk, “spin-the-pen”), and random sampling of households. Gridded population sampling is in its infancy, and further research is needed to assess the accuracy and feasibility of gridded population sampling. The GridSample R algorithm can be used to forward this research agenda.
This record has no associated files available for download.
More information
Accepted/In Press date: 4 July 2017
e-pub ahead of print date: 19 July 2017
Published date: 19 July 2017
Identifiers
Local EPrints ID: 415370
URI: http://eprints.soton.ac.uk/id/eprint/415370
ISSN: 1476-072X
PURE UUID: 94b5da3e-4541-4ac8-bf74-eaa984beaba5
Catalogue record
Date deposited: 08 Nov 2017 17:30
Last modified: 16 Mar 2024 04:11
Export record
Altmetrics
Contributors
Author:
Dana, Renee Thomson
Author:
Forrest R. Stevens
Author:
Nick Ruktanonchai
Author:
Marcia C. Castro
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics