Community-Based Bayesian Aggregation Models for Crowdsourcing
Community-Based Bayesian Aggregation Models for Crowdsourcing
This paper addresses the problem of extracting accurate labels from crowdsourced datasets, a key challenge in crowdsourcing. Prior work has focused on modeling the reliability of individual workers, for instance, by way of confusion matrices, and using these latent traits to estimate the true labels more accurately. However, this strategy becomes ineffective when there are too few labels per worker to reliably estimate their quality. To mitigate this issue, we propose a novel community-based Bayesian label aggregation model, CommunityBCC, which assumes that crowd workers conform to a few different types, where each type represents a group of workers with similar confusion matrices. We assume that each worker belongs to a certain community, where the worker’s confusion matrix is similar to (a perturbation of) the community’s confusion matrix. Our model can then learn a set of key latent features: (i) the confusion matrix of each community, (ii) the community membership of each user, and (iii) the aggregated label of each item. We compare the performance of our model against established aggregation methods on a number of large-scale, real-world crowdsourcing datasets. Our experimental results show that our CommunityBCC model consistently outperforms state-of-the-art label aggregation methods, gaining, on average, 8% more accuracy with the same amount of labels.
155-164
Venanzi, Matteo
ba24a77f-31a6-4c05-a647-babf8f660440
John, Guiver
7c1eee28-dd30-4550-8044-907c6c58a54e
Gabriella, Kazai
a0bc1f19-5ce6-4891-ac0a-ee279a0339f8
Pushmeet, Kohli
ae5719f1-5490-4f0e-8659-f2d84d0d07f6
Milad, Shokouhi
b1b791be-51fc-4088-966e-77a1a23cdaee
May 2014
Venanzi, Matteo
ba24a77f-31a6-4c05-a647-babf8f660440
John, Guiver
7c1eee28-dd30-4550-8044-907c6c58a54e
Gabriella, Kazai
a0bc1f19-5ce6-4891-ac0a-ee279a0339f8
Pushmeet, Kohli
ae5719f1-5490-4f0e-8659-f2d84d0d07f6
Milad, Shokouhi
b1b791be-51fc-4088-966e-77a1a23cdaee
Venanzi, Matteo, John, Guiver, Gabriella, Kazai, Pushmeet, Kohli and Milad, Shokouhi
(2014)
Community-Based Bayesian Aggregation Models for Crowdsourcing.
the 23rd International World Wide Web Conference (WWW 2014).
.
(doi:10.1145/2566486.2567989).
Record type:
Conference or Workshop Item
(Paper)
Abstract
This paper addresses the problem of extracting accurate labels from crowdsourced datasets, a key challenge in crowdsourcing. Prior work has focused on modeling the reliability of individual workers, for instance, by way of confusion matrices, and using these latent traits to estimate the true labels more accurately. However, this strategy becomes ineffective when there are too few labels per worker to reliably estimate their quality. To mitigate this issue, we propose a novel community-based Bayesian label aggregation model, CommunityBCC, which assumes that crowd workers conform to a few different types, where each type represents a group of workers with similar confusion matrices. We assume that each worker belongs to a certain community, where the worker’s confusion matrix is similar to (a perturbation of) the community’s confusion matrix. Our model can then learn a set of key latent features: (i) the confusion matrix of each community, (ii) the community membership of each user, and (iii) the aggregated label of each item. We compare the performance of our model against established aggregation methods on a number of large-scale, real-world crowdsourcing datasets. Our experimental results show that our CommunityBCC model consistently outperforms state-of-the-art label aggregation methods, gaining, on average, 8% more accuracy with the same amount of labels.
More information
Published date: May 2014
Venue - Dates:
the 23rd International World Wide Web Conference (WWW 2014), 2014-05-01
Organisations:
Agents, Interactions & Complexity
Identifiers
Local EPrints ID: 362614
URI: http://eprints.soton.ac.uk/id/eprint/362614
PURE UUID: a379a2ce-ac9e-4b16-9a31-851ee73e4a27
Catalogue record
Date deposited: 27 Feb 2014 15:47
Last modified: 14 Mar 2024 16:10
Export record
Altmetrics
Contributors
Author:
Matteo Venanzi
Author:
Guiver John
Author:
Kazai Gabriella
Author:
Kohli Pushmeet
Author:
Shokouhi Milad
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics