On the efficiency of data collection for crowdsourced classification
On the efficiency of data collection for crowdsourced classification
The quality of crowdsourced data is often highly variable. For this reason, it is common to collect redundant data and use statistical methods to aggregate it. Empirical studies show that the policies we use to collect such data have a strong impact on the accuracy of the system. However, there is little theoretical understanding of this phenomenon. In this paper we provide the first theoretical explanation of the accuracy gap between the most popular collection policies: the non-adaptive uniform allocation, and the adaptive uncertainty sampling and information gain maximisation. To do so, we propose a novel representation of the collection process in terms of random walks. Then, we use this tool to derive lower and upper bounds on the accuracy of the policies. With these bounds, we are able to quantify the advantage that the two adaptive policies have over the non-adaptive one for the first time.
Manino, Edoardo
e5cec65c-c44b-45de-8255-7b1d8edfc04d
Tran-Thanh, Long
e0666669-d34b-460e-950d-e8b139fab16c
Jennings, Nicholas
ab3d94cc-247c-4545-9d1e-65873d6cdb30
13 July 2018
Manino, Edoardo
e5cec65c-c44b-45de-8255-7b1d8edfc04d
Tran-Thanh, Long
e0666669-d34b-460e-950d-e8b139fab16c
Jennings, Nicholas
ab3d94cc-247c-4545-9d1e-65873d6cdb30
Manino, Edoardo, Tran-Thanh, Long and Jennings, Nicholas
(2018)
On the efficiency of data collection for crowdsourced classification.
International Joint Conference on Artificial Intelligence, , Stockholm, Sweden.
13 - 19 Jul 2018.
8 pp
.
Record type:
Conference or Workshop Item
(Paper)
Abstract
The quality of crowdsourced data is often highly variable. For this reason, it is common to collect redundant data and use statistical methods to aggregate it. Empirical studies show that the policies we use to collect such data have a strong impact on the accuracy of the system. However, there is little theoretical understanding of this phenomenon. In this paper we provide the first theoretical explanation of the accuracy gap between the most popular collection policies: the non-adaptive uniform allocation, and the adaptive uncertainty sampling and information gain maximisation. To do so, we propose a novel representation of the collection process in terms of random walks. Then, we use this tool to derive lower and upper bounds on the accuracy of the policies. With these bounds, we are able to quantify the advantage that the two adaptive policies have over the non-adaptive one for the first time.
Text
paper
- Accepted Manuscript
More information
Accepted/In Press date: 2018
Published date: 13 July 2018
Venue - Dates:
International Joint Conference on Artificial Intelligence, , Stockholm, Sweden, 2018-07-13 - 2018-07-19
Identifiers
Local EPrints ID: 420634
URI: http://eprints.soton.ac.uk/id/eprint/420634
PURE UUID: 490e04a6-030f-46b8-82f7-d15a4ecb2343
Catalogue record
Date deposited: 11 May 2018 16:30
Last modified: 15 Mar 2024 19:50
Export record
Contributors
Author:
Edoardo Manino
Author:
Long Tran-Thanh
Author:
Nicholas Jennings
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics