The University of Southampton
University of Southampton Institutional Repository

Predicting the demographics of Twitter users with programmatic weak supervision

Predicting the demographics of Twitter users with programmatic weak supervision
Predicting the demographics of Twitter users with programmatic weak supervision
Predicting the demographics of Twitter users has become a problem with a large interest in computational social sciences. However, the limited amount of public datasets with ground truth labels and the tremendous costs of hand-labeling make this task particularly challenging. Recently, programmatic weak supervision has emerged as a new framework to train classifiers on noisy data with minimal human labeling effort. In this paper, demographic prediction is framed for the first time as a programmatic weak supervision problem. A new three-step methodology for gender, age category, and location prediction is provided, which outperforms traditional programmatic weak supervision and is competitive with the state-of the-art deep learning model. The study is performed in Flanders, a small Dutch speaking European region, characterized by a limited number of user profiles and tweets. An evaluation conducted on an independent hand-labeled test set shows that the proposed methodology can be generalized to unseen users within the geographic area of interest.
0969-6016
Tonglet, Jonathan
4f72888a-9922-41e5-b8c0-c2ad5c68e0df
Jehoul, Astrid
857dec54-b86e-426a-8d13-296128c3737c
Reusens, Manon
3dc14c4b-793a-41d6-b7bd-64303cda1c42
Reusens, Michael
4264e5fa-ed9c-4446-ae74-a4248ae94a49
Baesens, Bart
f7c6496b-aa7f-4026-8616-ca61d9e216f0
Tonglet, Jonathan
4f72888a-9922-41e5-b8c0-c2ad5c68e0df
Jehoul, Astrid
857dec54-b86e-426a-8d13-296128c3737c
Reusens, Manon
3dc14c4b-793a-41d6-b7bd-64303cda1c42
Reusens, Michael
4264e5fa-ed9c-4446-ae74-a4248ae94a49
Baesens, Bart
f7c6496b-aa7f-4026-8616-ca61d9e216f0

Tonglet, Jonathan, Jehoul, Astrid, Reusens, Manon, Reusens, Michael and Baesens, Bart (2024) Predicting the demographics of Twitter users with programmatic weak supervision. International Transactions in Operational Research. (In Press)

Record type: Article

Abstract

Predicting the demographics of Twitter users has become a problem with a large interest in computational social sciences. However, the limited amount of public datasets with ground truth labels and the tremendous costs of hand-labeling make this task particularly challenging. Recently, programmatic weak supervision has emerged as a new framework to train classifiers on noisy data with minimal human labeling effort. In this paper, demographic prediction is framed for the first time as a programmatic weak supervision problem. A new three-step methodology for gender, age category, and location prediction is provided, which outperforms traditional programmatic weak supervision and is competitive with the state-of the-art deep learning model. The study is performed in Flanders, a small Dutch speaking European region, characterized by a limited number of user profiles and tweets. An evaluation conducted on an independent hand-labeled test set shows that the proposed methodology can be generalized to unseen users within the geographic area of interest.

Text
Paper_submission_TOP (2) - Accepted Manuscript
Restricted to Repository staff only until 22 January 2026.
Request a copy

More information

Accepted/In Press date: 22 January 2024

Identifiers

Local EPrints ID: 486509
URI: http://eprints.soton.ac.uk/id/eprint/486509
ISSN: 0969-6016
PURE UUID: 8f351bcd-d18d-496b-9aef-61265f6e5efb
ORCID for Bart Baesens: ORCID iD orcid.org/0000-0002-5831-5668

Catalogue record

Date deposited: 24 Jan 2024 17:59
Last modified: 18 Mar 2024 02:59

Export record

Contributors

Author: Jonathan Tonglet
Author: Astrid Jehoul
Author: Manon Reusens
Author: Michael Reusens
Author: Bart Baesens ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×