The University of Southampton
University of Southampton Institutional Repository

On the efficiency of data collection for crowdsourced classification

On the efficiency of data collection for crowdsourced classification
On the efficiency of data collection for crowdsourced classification
The quality of crowdsourced data is often highly variable. For this reason, it is common to collect redundant data and use statistical methods to aggregate it. Empirical studies show that the policies we use to collect such data have a strong impact on the accuracy of the system. However, there is little theoretical understanding of this phenomenon. In this paper we provide the first theoretical explanation of the accuracy gap between the most popular collection policies: the non-adaptive uniform allocation, and the adaptive uncertainty sampling and information gain maximisation. To do so, we propose a novel representation of the collection process in terms of random walks. Then, we use this tool to derive lower and upper bounds on the accuracy of the policies. With these bounds, we are able to quantify the advantage that the two adaptive policies have over the non-adaptive one for the first time.
Manino, Edoardo
e5cec65c-c44b-45de-8255-7b1d8edfc04d
Tran-Thanh, Long
e0666669-d34b-460e-950d-e8b139fab16c
Jennings, Nicholas
ab3d94cc-247c-4545-9d1e-65873d6cdb30
Manino, Edoardo
e5cec65c-c44b-45de-8255-7b1d8edfc04d
Tran-Thanh, Long
e0666669-d34b-460e-950d-e8b139fab16c
Jennings, Nicholas
ab3d94cc-247c-4545-9d1e-65873d6cdb30

Manino, Edoardo, Tran-Thanh, Long and Jennings, Nicholas (2018) On the efficiency of data collection for crowdsourced classification. At International Joint Conference on Artificial Intelligence (19/07/18) International Joint Conference on Artificial Intelligence, Stockholm, Sweden. 13 - 19 Jul 2018. 8 pp. (In Press)

Record type: Conference or Workshop Item (Paper)

Abstract

The quality of crowdsourced data is often highly variable. For this reason, it is common to collect redundant data and use statistical methods to aggregate it. Empirical studies show that the policies we use to collect such data have a strong impact on the accuracy of the system. However, there is little theoretical understanding of this phenomenon. In this paper we provide the first theoretical explanation of the accuracy gap between the most popular collection policies: the non-adaptive uniform allocation, and the adaptive uncertainty sampling and information gain maximisation. To do so, we propose a novel representation of the collection process in terms of random walks. Then, we use this tool to derive lower and upper bounds on the accuracy of the policies. With these bounds, we are able to quantify the advantage that the two adaptive policies have over the non-adaptive one for the first time.

Text paper - Accepted Manuscript
Download (350kB)

More information

Accepted/In Press date: 2018
Venue - Dates: International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 2018-07-13 - 2018-07-19

Identifiers

Local EPrints ID: 420634
URI: https://eprints.soton.ac.uk/id/eprint/420634
PURE UUID: 490e04a6-030f-46b8-82f7-d15a4ecb2343
ORCID for Edoardo Manino: ORCID iD orcid.org/0000-0003-0028-5440
ORCID for Long Tran-Thanh: ORCID iD orcid.org/0000-0003-1617-8316

Catalogue record

Date deposited: 11 May 2018 16:30
Last modified: 08 Aug 2018 00:31

Export record

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of https://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×