An investigation into the impact of workflow design and aggregation on achieving quality result in crowdsourcing classification tasks

Bu, Qiong (2020) An investigation into the impact of workflow design and aggregation on achieving quality result in crowdsourcing classification tasks. University of Southampton, Doctoral Thesis, 183pp.

Record type: Thesis (Doctoral)

Abstract

Microtask crowdsourcing has been applied in many fields in the past decades, but there are still important challenges not fully addressed, especially in task/workflow design and aggregation methods to help produce a correct result or assess the quality of the result. This research took a deeper look at crowdsourcing classification tasks and explored how task and workflow design can impact the quality of the classification result. This research used a large online knowledge base and three citizen science projects as examples to investigate workflow design variations and their impacts on the quality of the classification result based on statistical, probabilistic, or machine learning models for true label inference, such that design principles can be recommended and applied in other citizen science projects or other human-computer hybrid systems to improve overall quality. It is noticeable that most of the existing research on aggregation methods to infer true labels focus on simple single-step classification though a large portion of classification tasks are not simple single-step classification. There is only limited research looking into such multiple-step classification tasks in recent years and each has a domain-specific or problem-specific focus making it difficult to be applied to other multiple-steps classifications cases. This research focused on multiple-step classification, modeling the classification task as a path searching problem in a graph, and explored alternative aggregation strategies to infer correct label paths by leveraging established individual algorithms from simple majority voting to more sophisticated algorithms like message passing, and expectation-maximisation. This research also looked at alternative workflow design to classify objects using the DBpedia entity classification as a case study and demonstrated the pros and cons of automatic, hybrid, and completely humanbased workflows. As a result, it is able to provide suggestions to the task requesters for crowdsourcing classification task design and help them choose the aggregation method that will achieve a good quality result.

Text

Thesis - Version of Record

Available under License University of Southampton Thesis Licence.

Download (7MB)

Text

Permission to deposit thesis - form_qb1g13

Restricted to Repository staff only