Let’s agree to disagree: Fixing agreement measures for crowdsourcing
Let’s agree to disagree: Fixing agreement measures for crowdsourcing
In the context of micro-task crowdsourcing, each task is usually performed by several workers. This allows researchers to leverage measures of the agreement among workers on the same task, to estimate the reliability of collected data and to better understand answering behaviors of the participants. While many measures of agreement between annotators have been proposed, they are known for suffering from many problems and abnormalities. In this paper, we identify the main limits of the existing agreement measures in the crowdsourcing context, both by means of toy examples as well as with real-world crowdsourcing data, and propose a novel agreement measure based on probabilistic parameter estimation which overcomes such limits. We validate our new agreement measure and show its flexibility as compared to the existing agreement measures.
crowdsourcing, inter-rater agreement, reliability
11-20
Checco, Alessandro
92073d96-c52f-473b-944f-e468faa443c3
Roitero, Kevin
427331cd-8434-4367-a712-ae63e9c0fbf2
Maddalena, Eddy
397dbaba-4363-4c11-8e52-4a7ba4df4bae
Mizzaro, Stefano
d5ca9f4a-e74e-4b94-8b4f-b484558a4f92
Demartini, Gianluca
6ab7c865-3f9c-47dd-8abb-c8c6970e8182
October 2017
Checco, Alessandro
92073d96-c52f-473b-944f-e468faa443c3
Roitero, Kevin
427331cd-8434-4367-a712-ae63e9c0fbf2
Maddalena, Eddy
397dbaba-4363-4c11-8e52-4a7ba4df4bae
Mizzaro, Stefano
d5ca9f4a-e74e-4b94-8b4f-b484558a4f92
Demartini, Gianluca
6ab7c865-3f9c-47dd-8abb-c8c6970e8182
Checco, Alessandro, Roitero, Kevin, Maddalena, Eddy, Mizzaro, Stefano and Demartini, Gianluca
(2017)
Let’s agree to disagree: Fixing agreement measures for crowdsourcing.
In Proceedings of the Fifth Conference on Human Computation and Crowdsourcing (HCOMP 2017).
AAAI Press.
.
Record type:
Conference or Workshop Item
(Paper)
Abstract
In the context of micro-task crowdsourcing, each task is usually performed by several workers. This allows researchers to leverage measures of the agreement among workers on the same task, to estimate the reliability of collected data and to better understand answering behaviors of the participants. While many measures of agreement between annotators have been proposed, they are known for suffering from many problems and abnormalities. In this paper, we identify the main limits of the existing agreement measures in the crowdsourcing context, both by means of toy examples as well as with real-world crowdsourcing data, and propose a novel agreement measure based on probabilistic parameter estimation which overcomes such limits. We validate our new agreement measure and show its flexibility as compared to the existing agreement measures.
This record has no associated files available for download.
More information
Published date: October 2017
Keywords:
crowdsourcing, inter-rater agreement, reliability
Identifiers
Local EPrints ID: 420745
URI: http://eprints.soton.ac.uk/id/eprint/420745
PURE UUID: 06310f33-54d5-4d64-9f9a-c4492c366fa1
Catalogue record
Date deposited: 14 May 2018 16:30
Last modified: 15 Mar 2024 19:39
Export record
Contributors
Author:
Alessandro Checco
Author:
Kevin Roitero
Author:
Stefano Mizzaro
Author:
Gianluca Demartini
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics