The University of Southampton
University of Southampton Institutional Repository

Assessing the spatial sensitivity of a random forest model: Application in gridded population modeling

Assessing the spatial sensitivity of a random forest model: Application in gridded population modeling
Assessing the spatial sensitivity of a random forest model: Application in gridded population modeling

Gridded human population data provide a spatial denominator to identify populations at risk, quantify burdens, and inform our understanding of human-environment systems. When modeling gridded population, the information used for training the model may differ in spatial resolution than what is produced by the model prediction. This case arises when approaching population modeling from a top-down, dasymetric approach in which one redistributes coarse administrative unit level population data (i.e., source unit) to a finer scale (i.e., target unit). However, often overlooked are issues associated with the differing variance across the scale, spatial autocorrelation and bias in sampling techniques. In this study, we examine the effects of intentionally biasing our sampling from the source to target scale within the context of a weighted, dasymetric mapping approach. The weighted component is based on a Random Forest estimator, which is a non-parametric ensemble-based prediction model. We investigate issues of autocorrelation and heterogeneity in the training data using 18 different types of samples to show the variations in training, census-level (i.e., source) and output, grid-level (i.e., target) predictions. We compare results to simple random sampling and geographically stratified random sampling. Results indicate that the Random Forest model is sensitive to the spatial autocorrelation inherent in the training data, which leads to an increase in the variance of the residuals. Sample training datasets that are at a spatial scale representative of the true population produced the best fitting models. However, the true representative dataset varied in autocorrelation for both scales. More attention is needed with ensemble-based learning and spatially-heterogeneous data as underlying issues of spatial autocorrelation influence results for both the census-level and grid-level estimations.

Dasymetric modeling, Gridded population modeling, Random forest, Spatial autocorrelation
0198-9715
132-145
Sinha, Parmanand
b975ee23-d2a2-46c9-bbaf-c0becfd6640d
Gaughan, Andrea E.
395221c6-1091-4657-af7e-bd6cb93dbaf9
Stevens, Forrest R.
7c96c2ef-edac-41a1-be26-c4bc5b3256a6
Nieves, Jeremiah J.
2b5f2f25-afc0-4585-8531-dc2acc4b3511
Sorichetta, Alessandro
c80d941b-a3f5-4a6d-9a19-e3eeba84443c
Tatem, Andrew J.
6c6de104-a5f9-46e0-bb93-a1a7c980513e
Sinha, Parmanand
b975ee23-d2a2-46c9-bbaf-c0becfd6640d
Gaughan, Andrea E.
395221c6-1091-4657-af7e-bd6cb93dbaf9
Stevens, Forrest R.
7c96c2ef-edac-41a1-be26-c4bc5b3256a6
Nieves, Jeremiah J.
2b5f2f25-afc0-4585-8531-dc2acc4b3511
Sorichetta, Alessandro
c80d941b-a3f5-4a6d-9a19-e3eeba84443c
Tatem, Andrew J.
6c6de104-a5f9-46e0-bb93-a1a7c980513e

Sinha, Parmanand, Gaughan, Andrea E., Stevens, Forrest R., Nieves, Jeremiah J., Sorichetta, Alessandro and Tatem, Andrew J. (2019) Assessing the spatial sensitivity of a random forest model: Application in gridded population modeling. Computers, Environment and Urban Systems, 75, 132-145. (doi:10.1016/j.compenvurbsys.2019.01.006).

Record type: Article

Abstract

Gridded human population data provide a spatial denominator to identify populations at risk, quantify burdens, and inform our understanding of human-environment systems. When modeling gridded population, the information used for training the model may differ in spatial resolution than what is produced by the model prediction. This case arises when approaching population modeling from a top-down, dasymetric approach in which one redistributes coarse administrative unit level population data (i.e., source unit) to a finer scale (i.e., target unit). However, often overlooked are issues associated with the differing variance across the scale, spatial autocorrelation and bias in sampling techniques. In this study, we examine the effects of intentionally biasing our sampling from the source to target scale within the context of a weighted, dasymetric mapping approach. The weighted component is based on a Random Forest estimator, which is a non-parametric ensemble-based prediction model. We investigate issues of autocorrelation and heterogeneity in the training data using 18 different types of samples to show the variations in training, census-level (i.e., source) and output, grid-level (i.e., target) predictions. We compare results to simple random sampling and geographically stratified random sampling. Results indicate that the Random Forest model is sensitive to the spatial autocorrelation inherent in the training data, which leads to an increase in the variance of the residuals. Sample training datasets that are at a spatial scale representative of the true population produced the best fitting models. However, the true representative dataset varied in autocorrelation for both scales. More attention is needed with ensemble-based learning and spatially-heterogeneous data as underlying issues of spatial autocorrelation influence results for both the census-level and grid-level estimations.

Text
1-s2.0-S0198971518302862-main - Version of Record
Download (3MB)

More information

Accepted/In Press date: 12 January 2019
e-pub ahead of print date: 8 February 2019
Published date: 1 May 2019
Keywords: Dasymetric modeling, Gridded population modeling, Random forest, Spatial autocorrelation

Identifiers

Local EPrints ID: 428254
URI: https://eprints.soton.ac.uk/id/eprint/428254
ISSN: 0198-9715
PURE UUID: f6e56bb6-7a2d-4c56-bd46-c69f70c506c1
ORCID for Alessandro Sorichetta: ORCID iD orcid.org/0000-0002-3576-5826
ORCID for Andrew J. Tatem: ORCID iD orcid.org/0000-0002-7270-941X

Catalogue record

Date deposited: 19 Feb 2019 17:30
Last modified: 10 Dec 2019 01:37

Export record

Altmetrics

Contributors

Author: Parmanand Sinha
Author: Andrea E. Gaughan
Author: Forrest R. Stevens
Author: Andrew J. Tatem ORCID iD

University divisions

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of https://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×