An investigation into the impact of workflow design and aggregation on achieving quality result in crowdsourcing classification tasks
An investigation into the impact of workflow design and aggregation on achieving quality result in crowdsourcing classification tasks
Microtask crowdsourcing has been applied in many fields in the past decades, but there are still important challenges not fully addressed, especially in task/workflow design and aggregation methods to help produce a correct result or assess the quality of the result. This research took a deeper look at crowdsourcing classification tasks and explored how task and workflow design can impact the quality of the classification result. This research used a large online knowledge base and three citizen science projects as examples to investigate workflow design variations and their impacts on the quality of the classification result based on statistical, probabilistic, or machine learning models for true label inference, such that design principles can be recommended and applied in other citizen science projects or other human-computer hybrid systems to improve overall quality. It is noticeable that most of the existing research on aggregation methods to infer true labels focus on simple single-step classification though a large portion of classification tasks are not simple single-step classification. There is only limited research looking into such multiple-step classification tasks in recent years and each has a domain-specific or problem-specific focus making it difficult to be applied to other multiple-steps classifications cases. This research focused on multiple-step classification, modeling the classification task as a path searching problem in a graph, and explored alternative aggregation strategies to infer correct label paths by leveraging established individual algorithms from simple majority voting to more sophisticated algorithms like message passing, and expectation-maximisation. This research also looked at alternative workflow design to classify objects using the DBpedia entity classification as a case study and demonstrated the pros and cons of automatic, hybrid, and completely humanbased workflows. As a result, it is able to provide suggestions to the task requesters for crowdsourcing classification task design and help them choose the aggregation method that will achieve a good quality result.
University of Southampton
Bu, Qiong
ce52e778-20d8-466e-afec-fec74620c959
February 2020
Bu, Qiong
ce52e778-20d8-466e-afec-fec74620c959
Simperl, Elena
40261ae4-c58c-48e4-b78b-5187b10e4f67
Bu, Qiong
(2020)
An investigation into the impact of workflow design and aggregation on achieving quality result in crowdsourcing classification tasks.
University of Southampton, Doctoral Thesis, 183pp.
Record type:
Thesis
(Doctoral)
Abstract
Microtask crowdsourcing has been applied in many fields in the past decades, but there are still important challenges not fully addressed, especially in task/workflow design and aggregation methods to help produce a correct result or assess the quality of the result. This research took a deeper look at crowdsourcing classification tasks and explored how task and workflow design can impact the quality of the classification result. This research used a large online knowledge base and three citizen science projects as examples to investigate workflow design variations and their impacts on the quality of the classification result based on statistical, probabilistic, or machine learning models for true label inference, such that design principles can be recommended and applied in other citizen science projects or other human-computer hybrid systems to improve overall quality. It is noticeable that most of the existing research on aggregation methods to infer true labels focus on simple single-step classification though a large portion of classification tasks are not simple single-step classification. There is only limited research looking into such multiple-step classification tasks in recent years and each has a domain-specific or problem-specific focus making it difficult to be applied to other multiple-steps classifications cases. This research focused on multiple-step classification, modeling the classification task as a path searching problem in a graph, and explored alternative aggregation strategies to infer correct label paths by leveraging established individual algorithms from simple majority voting to more sophisticated algorithms like message passing, and expectation-maximisation. This research also looked at alternative workflow design to classify objects using the DBpedia entity classification as a case study and demonstrated the pros and cons of automatic, hybrid, and completely humanbased workflows. As a result, it is able to provide suggestions to the task requesters for crowdsourcing classification task design and help them choose the aggregation method that will achieve a good quality result.
Text
Thesis
- Version of Record
Text
Permission to deposit thesis - form_qb1g13
Restricted to Repository staff only
More information
Published date: February 2020
Identifiers
Local EPrints ID: 452348
URI: http://eprints.soton.ac.uk/id/eprint/452348
PURE UUID: 56a140ac-0230-48ee-99b9-1905fec3d10d
Catalogue record
Date deposited: 08 Dec 2021 18:46
Last modified: 16 Mar 2024 09:48
Export record
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics