The University of Southampton
University of Southampton Institutional Repository

Community-Based Bayesian Aggregation Models for Crowdsourcing

Community-Based Bayesian Aggregation Models for Crowdsourcing
Community-Based Bayesian Aggregation Models for Crowdsourcing
This paper addresses the problem of extracting accurate labels from crowdsourced datasets, a key challenge in crowdsourcing. Prior work has focused on modeling the reliability of individual workers, for instance, by way of confusion matrices, and using these latent traits to estimate the true labels more accurately. However, this strategy becomes ineffective when there are too few labels per worker to reliably estimate their quality. To mitigate this issue, we propose a novel community-based Bayesian label aggregation model, CommunityBCC, which assumes that crowd workers conform to a few different types, where each type represents a group of workers with similar confusion matrices. We assume that each worker belongs to a certain community, where the worker’s confusion matrix is similar to (a perturbation of) the community’s confusion matrix. Our model can then learn a set of key latent features: (i) the confusion matrix of each community, (ii) the community membership of each user, and (iii) the aggregated label of each item. We compare the performance of our model against established aggregation methods on a number of large-scale, real-world crowdsourcing datasets. Our experimental results show that our CommunityBCC model consistently outperforms state-of-the-art label aggregation methods, gaining, on average, 8% more accuracy with the same amount of labels.
155-164
Venanzi, Matteo
ba24a77f-31a6-4c05-a647-babf8f660440
John, Guiver
7c1eee28-dd30-4550-8044-907c6c58a54e
Gabriella, Kazai
a0bc1f19-5ce6-4891-ac0a-ee279a0339f8
Pushmeet, Kohli
ae5719f1-5490-4f0e-8659-f2d84d0d07f6
Milad, Shokouhi
b1b791be-51fc-4088-966e-77a1a23cdaee
Venanzi, Matteo
ba24a77f-31a6-4c05-a647-babf8f660440
John, Guiver
7c1eee28-dd30-4550-8044-907c6c58a54e
Gabriella, Kazai
a0bc1f19-5ce6-4891-ac0a-ee279a0339f8
Pushmeet, Kohli
ae5719f1-5490-4f0e-8659-f2d84d0d07f6
Milad, Shokouhi
b1b791be-51fc-4088-966e-77a1a23cdaee

Venanzi, Matteo, John, Guiver, Gabriella, Kazai, Pushmeet, Kohli and Milad, Shokouhi (2014) Community-Based Bayesian Aggregation Models for Crowdsourcing. the 23rd International World Wide Web Conference (WWW 2014). pp. 155-164 . (doi:10.1145/2566486.2567989).

Record type: Conference or Workshop Item (Paper)

Abstract

This paper addresses the problem of extracting accurate labels from crowdsourced datasets, a key challenge in crowdsourcing. Prior work has focused on modeling the reliability of individual workers, for instance, by way of confusion matrices, and using these latent traits to estimate the true labels more accurately. However, this strategy becomes ineffective when there are too few labels per worker to reliably estimate their quality. To mitigate this issue, we propose a novel community-based Bayesian label aggregation model, CommunityBCC, which assumes that crowd workers conform to a few different types, where each type represents a group of workers with similar confusion matrices. We assume that each worker belongs to a certain community, where the worker’s confusion matrix is similar to (a perturbation of) the community’s confusion matrix. Our model can then learn a set of key latent features: (i) the confusion matrix of each community, (ii) the community membership of each user, and (iii) the aggregated label of each item. We compare the performance of our model against established aggregation methods on a number of large-scale, real-world crowdsourcing datasets. Our experimental results show that our CommunityBCC model consistently outperforms state-of-the-art label aggregation methods, gaining, on average, 8% more accuracy with the same amount of labels.

Text
main.pdf - Other
Download (1MB)

More information

Published date: May 2014
Venue - Dates: the 23rd International World Wide Web Conference (WWW 2014), 2014-05-01
Organisations: Agents, Interactions & Complexity

Identifiers

Local EPrints ID: 362614
URI: http://eprints.soton.ac.uk/id/eprint/362614
PURE UUID: a379a2ce-ac9e-4b16-9a31-851ee73e4a27

Catalogue record

Date deposited: 27 Feb 2014 15:47
Last modified: 14 Mar 2024 16:10

Export record

Altmetrics

Contributors

Author: Matteo Venanzi
Author: Guiver John
Author: Kazai Gabriella
Author: Kohli Pushmeet
Author: Shokouhi Milad

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×