The University of Southampton
University of Southampton Institutional Repository

An investigation into the impact of workflow design and aggregation on achieving quality result in crowdsourcing classification tasks

An investigation into the impact of workflow design and aggregation on achieving quality result in crowdsourcing classification tasks
An investigation into the impact of workflow design and aggregation on achieving quality result in crowdsourcing classification tasks
Microtask crowdsourcing has been applied in many fields in the past decades, but there are still important challenges not fully addressed, especially in task/workflow design and aggregation methods to help produce a correct result or assess the quality of the result. This research took a deeper look at crowdsourcing classification tasks and explored how task and workflow design can impact the quality of the classification result. This research used a large online knowledge base and three citizen science projects as examples to investigate workflow design variations and their impacts on the quality of the classification result based on statistical, probabilistic, or machine learning models for true label inference, such that design principles can be recommended and applied in other citizen science projects or other human-computer hybrid systems to improve overall quality. It is noticeable that most of the existing research on aggregation methods to infer true labels focus on simple single-step classification though a large portion of classification tasks are not simple single-step classification. There is only limited research looking into such multiple-step classification tasks in recent years and each has a domain-specific or problem-specific focus making it difficult to be applied to other multiple-steps classifications cases. This research focused on multiple-step classification, modeling the classification task as a path searching problem in a graph, and explored alternative aggregation strategies to infer correct label paths by leveraging established individual algorithms from simple majority voting to more sophisticated algorithms like message passing, and expectation-maximisation. This research also looked at alternative workflow design to classify objects using the DBpedia entity classification as a case study and demonstrated the pros and cons of automatic, hybrid, and completely humanbased workflows. As a result, it is able to provide suggestions to the task requesters for crowdsourcing classification task design and help them choose the aggregation method that will achieve a good quality result.
University of Southampton
Bu, Qiong
ce52e778-20d8-466e-afec-fec74620c959
Bu, Qiong
ce52e778-20d8-466e-afec-fec74620c959
Simperl, Elena
40261ae4-c58c-48e4-b78b-5187b10e4f67

Bu, Qiong (2020) An investigation into the impact of workflow design and aggregation on achieving quality result in crowdsourcing classification tasks. University of Southampton, Doctoral Thesis, 183pp.

Record type: Thesis (Doctoral)

Abstract

Microtask crowdsourcing has been applied in many fields in the past decades, but there are still important challenges not fully addressed, especially in task/workflow design and aggregation methods to help produce a correct result or assess the quality of the result. This research took a deeper look at crowdsourcing classification tasks and explored how task and workflow design can impact the quality of the classification result. This research used a large online knowledge base and three citizen science projects as examples to investigate workflow design variations and their impacts on the quality of the classification result based on statistical, probabilistic, or machine learning models for true label inference, such that design principles can be recommended and applied in other citizen science projects or other human-computer hybrid systems to improve overall quality. It is noticeable that most of the existing research on aggregation methods to infer true labels focus on simple single-step classification though a large portion of classification tasks are not simple single-step classification. There is only limited research looking into such multiple-step classification tasks in recent years and each has a domain-specific or problem-specific focus making it difficult to be applied to other multiple-steps classifications cases. This research focused on multiple-step classification, modeling the classification task as a path searching problem in a graph, and explored alternative aggregation strategies to infer correct label paths by leveraging established individual algorithms from simple majority voting to more sophisticated algorithms like message passing, and expectation-maximisation. This research also looked at alternative workflow design to classify objects using the DBpedia entity classification as a case study and demonstrated the pros and cons of automatic, hybrid, and completely humanbased workflows. As a result, it is able to provide suggestions to the task requesters for crowdsourcing classification task design and help them choose the aggregation method that will achieve a good quality result.

Text
Thesis - Version of Record
Available under License University of Southampton Thesis Licence.
Download (7MB)
Text
Permission to deposit thesis - form_qb1g13
Restricted to Repository staff only

More information

Published date: February 2020

Identifiers

Local EPrints ID: 452348
URI: http://eprints.soton.ac.uk/id/eprint/452348
PURE UUID: 56a140ac-0230-48ee-99b9-1905fec3d10d
ORCID for Elena Simperl: ORCID iD orcid.org/0000-0003-1722-947X

Catalogue record

Date deposited: 08 Dec 2021 18:46
Last modified: 13 Dec 2021 03:10

Export record

Contributors

Author: Qiong Bu
Thesis advisor: Elena Simperl ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×