The University of Southampton
University of Southampton Institutional Repository

Social media data for population mapping: a Bayesian approach to address representativeness and privacy challenges

Social media data for population mapping: a Bayesian approach to address representativeness and privacy challenges
Social media data for population mapping: a Bayesian approach to address representativeness and privacy challenges
Accurate and timely population data are essential for disaster response and humanitarian planning, but traditional censuses often cannot capture rapid demographic changes. Social media data offer a promising alternative for dynamic population monitoring, but their representativeness remains poorly understood and stringent privacy requirements limit their reliability. Here, we address these limitations in the context of the Philippines by calibrating Facebook user counts with the country's 2020 census figures. First, we find that differential privacy techniques commonly applied to social media-based population datasets disproportionately mask low-population areas. To address this, we propose a Bayesian imputation approach to recover missing values, restoring data coverage for $5.5\%$ of rural areas. Further, using the imputed social media data and leveraging predictors such as urbanisation level, demographic composition, and socio-economic status, we develop a statistical model for the proportion of Facebook users in each municipality, which links observed Facebook user numbers to the true population levels. Out-of-sample validation demonstrates strong result generalisability, with errors as low as ${\approx}18\%$ and ${\approx}24\%$ for urban and rural Facebook user proportions, respectively. We further demonstrate that accounting for overdispersion and spatial correlations in the data is crucial to obtain accurate estimates and appropriate credible intervals. Crucially, as predictors change over time, the models can be used to regularly update the population predictions, providing a dynamic complement to census-based estimates. These results have direct implications for humanitarian response in disaster-prone regions and offer a general framework for using biased social media signals to generate reliable and timely population data.
stat.AP, stat.ME
arXiv
Andrich, Paolo
d931b46e-be7d-4630-a9dc-02f5bd458553
Lai, Shengjie
b57a5fe8-cfb6-4fa7-b414-a98bb891b001
Jun, Halim
57196028-00e7-47a2-904f-8e1bc7503910
Duan, Qianwen
f257122e-0d49-4734-ad42-ffd58ae7060d
Cheng, Zhifeng
d9d2cbb1-163a-46c9-b587-144e20b415d2
Flaxman, Seth R.
08358058-82f3-47fd-8112-d248829158a1
Tatem, Andrew J.
6c6de104-a5f9-46e0-bb93-a1a7c980513e
Andrich, Paolo
d931b46e-be7d-4630-a9dc-02f5bd458553
Lai, Shengjie
b57a5fe8-cfb6-4fa7-b414-a98bb891b001
Jun, Halim
57196028-00e7-47a2-904f-8e1bc7503910
Duan, Qianwen
f257122e-0d49-4734-ad42-ffd58ae7060d
Cheng, Zhifeng
d9d2cbb1-163a-46c9-b587-144e20b415d2
Flaxman, Seth R.
08358058-82f3-47fd-8112-d248829158a1
Tatem, Andrew J.
6c6de104-a5f9-46e0-bb93-a1a7c980513e

[Unknown type: UNSPECIFIED]

Record type: UNSPECIFIED

Abstract

Accurate and timely population data are essential for disaster response and humanitarian planning, but traditional censuses often cannot capture rapid demographic changes. Social media data offer a promising alternative for dynamic population monitoring, but their representativeness remains poorly understood and stringent privacy requirements limit their reliability. Here, we address these limitations in the context of the Philippines by calibrating Facebook user counts with the country's 2020 census figures. First, we find that differential privacy techniques commonly applied to social media-based population datasets disproportionately mask low-population areas. To address this, we propose a Bayesian imputation approach to recover missing values, restoring data coverage for $5.5\%$ of rural areas. Further, using the imputed social media data and leveraging predictors such as urbanisation level, demographic composition, and socio-economic status, we develop a statistical model for the proportion of Facebook users in each municipality, which links observed Facebook user numbers to the true population levels. Out-of-sample validation demonstrates strong result generalisability, with errors as low as ${\approx}18\%$ and ${\approx}24\%$ for urban and rural Facebook user proportions, respectively. We further demonstrate that accounting for overdispersion and spatial correlations in the data is crucial to obtain accurate estimates and appropriate credible intervals. Crucially, as predictors change over time, the models can be used to regularly update the population predictions, providing a dynamic complement to census-based estimates. These results have direct implications for humanitarian response in disaster-prone regions and offer a general framework for using biased social media signals to generate reliable and timely population data.

Text
2601.22104v1 - Author's Original
Available under License Creative Commons Attribution.
Download (48MB)

More information

Published date: 29 January 2026
Additional Information: 25 pages, 8 figures
Keywords: stat.AP, stat.ME

Identifiers

Local EPrints ID: 509696
URI: http://eprints.soton.ac.uk/id/eprint/509696
PURE UUID: 7cd4c1b1-3f0a-477d-8355-27fe8b40a21d
ORCID for Shengjie Lai: ORCID iD orcid.org/0000-0001-9781-8148
ORCID for Qianwen Duan: ORCID iD orcid.org/0000-0003-4342-5044
ORCID for Andrew J. Tatem: ORCID iD orcid.org/0000-0002-7270-941X

Catalogue record

Date deposited: 02 Mar 2026 18:01
Last modified: 03 Mar 2026 03:10

Export record

Altmetrics

Contributors

Author: Paolo Andrich
Author: Shengjie Lai ORCID iD
Author: Halim Jun
Author: Qianwen Duan ORCID iD
Author: Zhifeng Cheng
Author: Seth R. Flaxman
Author: Andrew J. Tatem ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×