Predicting the demographics of Twitter users with programmatic weak supervision
Predicting the demographics of Twitter users with programmatic weak supervision
Predicting the demographics of Twitter users has become a problem with a large interest in computational social sciences. However, the limited amount of public datasets with ground truth labels and the tremendous costs of hand-labeling make this task particularly challenging. Recently, programmatic weak supervision has emerged as a new framework to train classifiers on noisy data with minimal human labeling effort. In this paper, demographic prediction is framed for the first time as a programmatic weak supervision problem. A new three-step methodology for gender, age category, and location prediction is provided, which outperforms traditional programmatic weak supervision and is competitive with the state-of the-art deep learning model. The study is performed in Flanders, a small Dutch speaking European region, characterized by a limited number of user profiles and tweets. An evaluation conducted on an independent hand-labeled test set shows that the proposed methodology can be generalized to unseen users within the geographic area of interest.
Tonglet, Jonathan
4f72888a-9922-41e5-b8c0-c2ad5c68e0df
Jehoul, Astrid
857dec54-b86e-426a-8d13-296128c3737c
Reusens, Manon
3dc14c4b-793a-41d6-b7bd-64303cda1c42
Reusens, Michael
4264e5fa-ed9c-4446-ae74-a4248ae94a49
Baesens, Bart
f7c6496b-aa7f-4026-8616-ca61d9e216f0
Tonglet, Jonathan
4f72888a-9922-41e5-b8c0-c2ad5c68e0df
Jehoul, Astrid
857dec54-b86e-426a-8d13-296128c3737c
Reusens, Manon
3dc14c4b-793a-41d6-b7bd-64303cda1c42
Reusens, Michael
4264e5fa-ed9c-4446-ae74-a4248ae94a49
Baesens, Bart
f7c6496b-aa7f-4026-8616-ca61d9e216f0
Tonglet, Jonathan, Jehoul, Astrid, Reusens, Manon, Reusens, Michael and Baesens, Bart
(2024)
Predicting the demographics of Twitter users with programmatic weak supervision.
International Transactions in Operational Research.
(In Press)
Abstract
Predicting the demographics of Twitter users has become a problem with a large interest in computational social sciences. However, the limited amount of public datasets with ground truth labels and the tremendous costs of hand-labeling make this task particularly challenging. Recently, programmatic weak supervision has emerged as a new framework to train classifiers on noisy data with minimal human labeling effort. In this paper, demographic prediction is framed for the first time as a programmatic weak supervision problem. A new three-step methodology for gender, age category, and location prediction is provided, which outperforms traditional programmatic weak supervision and is competitive with the state-of the-art deep learning model. The study is performed in Flanders, a small Dutch speaking European region, characterized by a limited number of user profiles and tweets. An evaluation conducted on an independent hand-labeled test set shows that the proposed methodology can be generalized to unseen users within the geographic area of interest.
Text
Paper_submission_TOP (2)
- Accepted Manuscript
Restricted to Repository staff only until 22 January 2026.
Request a copy
More information
Accepted/In Press date: 22 January 2024
Identifiers
Local EPrints ID: 486509
URI: http://eprints.soton.ac.uk/id/eprint/486509
ISSN: 0969-6016
PURE UUID: 8f351bcd-d18d-496b-9aef-61265f6e5efb
Catalogue record
Date deposited: 24 Jan 2024 17:59
Last modified: 18 Mar 2024 02:59
Export record
Contributors
Author:
Jonathan Tonglet
Author:
Astrid Jehoul
Author:
Manon Reusens
Author:
Michael Reusens
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics