The University of Southampton
University of Southampton Institutional Repository

Geospatial foundation-model embeddings improve population estimation unevenly across space and scale

Geospatial foundation-model embeddings improve population estimation unevenly across space and scale
Geospatial foundation-model embeddings improve population estimation unevenly across space and scale
Reliable subnational population estimates are essential for applications, yet remain difficult where censuses are sparse, outdated or spatially coarse. Existing population-mapping workflows rely on hand-built geospatial covariates, such as settlement extent, night-time lights, and environmental conditions, which must be assembled and harmonised across scales and geographies. Geospatial foundation models offer an alternative by learning reusable representations of place from more multifaceted and heterogeneous data sources. Here, we benchmark Population Dynamics Foundation Model (PDFM) embeddings against the harmonised geospatial covariates for subnational population estimation in Brazil, Nigeria and the United States. Under geographically structured validation, PDFM increased predictive fit by a median of 20.1% (IQR: 10.0-33.2%, across country-model comparisons) reduction in unexplained variance, and reduced Kullback-Leibler divergence by 23.2% (9.2-26.2%). However, these gains were uneven. PDFM was most advantageous where the geospatial covariates weakly characterised settlement context, such as larger and less-developed subnational areas. Moreover, PDFM performance was scale-coupled with embeddings providing less flexible transfer across spatial aggregations than geospatial covariates. These findings showed that geospatial foundation-model representations of place can improve population estimation in data poor settings, but their benefits break down predictably under spatial scale mismatch, revealing a fundamental limitation of current geospatial AI.
cs.LG
arXiv
Zhang, Wenbin
a4ab325c-e9cb-4369-959b-25a3320bb4e3
Cleary, Eimear
3cbf7016-269e-4517-ab4f-323e86db6e58
Rowe, Francisco
51ebebce-11dc-478e-ac70-094b72118cdf
Chaudhuri, Somnath
ae0507e0-f920-4438-bc9f-ecdd5ac8967a
Bondarenko, Maksym
1cbea387-2a42-4061-9713-bbfdf4d11226
Lai, Shengjie
b57a5fe8-cfb6-4fa7-b414-a98bb891b001
Tatem, Andrew J.
6c6de104-a5f9-46e0-bb93-a1a7c980513e
Zhang, Wenbin
a4ab325c-e9cb-4369-959b-25a3320bb4e3
Cleary, Eimear
3cbf7016-269e-4517-ab4f-323e86db6e58
Rowe, Francisco
51ebebce-11dc-478e-ac70-094b72118cdf
Chaudhuri, Somnath
ae0507e0-f920-4438-bc9f-ecdd5ac8967a
Bondarenko, Maksym
1cbea387-2a42-4061-9713-bbfdf4d11226
Lai, Shengjie
b57a5fe8-cfb6-4fa7-b414-a98bb891b001
Tatem, Andrew J.
6c6de104-a5f9-46e0-bb93-a1a7c980513e

[Unknown type: UNSPECIFIED]

Record type: UNSPECIFIED

Abstract

Reliable subnational population estimates are essential for applications, yet remain difficult where censuses are sparse, outdated or spatially coarse. Existing population-mapping workflows rely on hand-built geospatial covariates, such as settlement extent, night-time lights, and environmental conditions, which must be assembled and harmonised across scales and geographies. Geospatial foundation models offer an alternative by learning reusable representations of place from more multifaceted and heterogeneous data sources. Here, we benchmark Population Dynamics Foundation Model (PDFM) embeddings against the harmonised geospatial covariates for subnational population estimation in Brazil, Nigeria and the United States. Under geographically structured validation, PDFM increased predictive fit by a median of 20.1% (IQR: 10.0-33.2%, across country-model comparisons) reduction in unexplained variance, and reduced Kullback-Leibler divergence by 23.2% (9.2-26.2%). However, these gains were uneven. PDFM was most advantageous where the geospatial covariates weakly characterised settlement context, such as larger and less-developed subnational areas. Moreover, PDFM performance was scale-coupled with embeddings providing less flexible transfer across spatial aggregations than geospatial covariates. These findings showed that geospatial foundation-model representations of place can improve population estimation in data poor settings, but their benefits break down predictably under spatial scale mismatch, revealing a fundamental limitation of current geospatial AI.

Text
2605.01650v1 - Author's Original
Available under License Creative Commons Attribution.
Download (6MB)

More information

Published date: 3 May 2026
Keywords: cs.LG

Identifiers

Local EPrints ID: 511795
URI: http://eprints.soton.ac.uk/id/eprint/511795
PURE UUID: e8fe60cb-5a2d-498b-8ba9-644e5d56e102
ORCID for Wenbin Zhang: ORCID iD orcid.org/0000-0002-9295-1019
ORCID for Eimear Cleary: ORCID iD orcid.org/0000-0003-2549-8565
ORCID for Somnath Chaudhuri: ORCID iD orcid.org/0000-0003-4899-1870
ORCID for Maksym Bondarenko: ORCID iD orcid.org/0000-0003-4958-6551
ORCID for Shengjie Lai: ORCID iD orcid.org/0000-0001-9781-8148
ORCID for Andrew J. Tatem: ORCID iD orcid.org/0000-0002-7270-941X

Catalogue record

Date deposited: 02 Jun 2026 16:52
Last modified: 03 Jun 2026 02:09

Export record

Altmetrics

Contributors

Author: Wenbin Zhang ORCID iD
Author: Eimear Cleary ORCID iD
Author: Francisco Rowe
Author: Somnath Chaudhuri ORCID iD
Author: Shengjie Lai ORCID iD
Author: Andrew J. Tatem ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×