The University of Southampton
University of Southampton Institutional Repository

Using microtasks to crowdsource DBpedia entity classification: A study in workflow design

Using microtasks to crowdsource DBpedia entity classification: A study in workflow design
Using microtasks to crowdsource DBpedia entity classification: A study in workflow design
DBpedia is at the core of the Linked Open Data Cloud and widely used in research and applications. However, it is far from being perfect. Its content suffers from many flaws, as a result of factual errors inherited from Wikipedia or incomplete mappings from Wikipedia infobox to DBpedia ontology. In this work we focus on one class of such problems, un-typed entities. We propose a hierarchical tree-based approach to categorize DBpedia entities according to the DBpedia ontology using human computation and paid microtasks. We analyse the main dimensions of the crowdsourcing exercise in depth in order to come up with suggestions for workflow design and study three different workflows with automatic and hybrid prediction mechanisms to select possible candidates for the most specific category from the DBpedia ontology. To test our approach, we run experiments on CrowdFlower using a gold standard dataset of 120 previously unclassified entities. In our studies human-computation driven approaches generally achieved higher precision at lower cost when compared to workflows with automatic predictors. However, each of the tested workflows has its merit and none of them seems to perform exceptionally well on the entities that the DBpedia Extraction Framework fails to classify. We discuss these findings and their potential implications for the design of effective crowdsourced entity classification in DBpedia and beyond.
1570-0844
337-354
Bu, Qiong
ce52e778-20d8-466e-afec-fec74620c959
Simperl, Elena
40261ae4-c58c-48e4-b78b-5187b10e4f67
Zerr, Sergej
0d1a9e2d-d0fc-4683-a762-bfa7f523ce3d
Li, Yunjia
3a0d988e-b5e3-43c9-a268-dc14b5313547
Bu, Qiong
ce52e778-20d8-466e-afec-fec74620c959
Simperl, Elena
40261ae4-c58c-48e4-b78b-5187b10e4f67
Zerr, Sergej
0d1a9e2d-d0fc-4683-a762-bfa7f523ce3d
Li, Yunjia
3a0d988e-b5e3-43c9-a268-dc14b5313547

Bu, Qiong, Simperl, Elena, Zerr, Sergej and Li, Yunjia (2018) Using microtasks to crowdsource DBpedia entity classification: A study in workflow design. Semantic Web, 9 (3), 337-354. (doi:10.3233/SW-170261).

Record type: Article

Abstract

DBpedia is at the core of the Linked Open Data Cloud and widely used in research and applications. However, it is far from being perfect. Its content suffers from many flaws, as a result of factual errors inherited from Wikipedia or incomplete mappings from Wikipedia infobox to DBpedia ontology. In this work we focus on one class of such problems, un-typed entities. We propose a hierarchical tree-based approach to categorize DBpedia entities according to the DBpedia ontology using human computation and paid microtasks. We analyse the main dimensions of the crowdsourcing exercise in depth in order to come up with suggestions for workflow design and study three different workflows with automatic and hybrid prediction mechanisms to select possible candidates for the most specific category from the DBpedia ontology. To test our approach, we run experiments on CrowdFlower using a gold standard dataset of 120 previously unclassified entities. In our studies human-computation driven approaches generally achieved higher precision at lower cost when compared to workflows with automatic predictors. However, each of the tested workflows has its merit and none of them seems to perform exceptionally well on the entities that the DBpedia Extraction Framework fails to classify. We discuss these findings and their potential implications for the design of effective crowdsourced entity classification in DBpedia and beyond.

Text
paper.pdf - Accepted Manuscript
Download (583kB)

More information

Accepted/In Press date: 28 July 2016
e-pub ahead of print date: 12 April 2018
Organisations: Web & Internet Science

Identifiers

Local EPrints ID: 398733
URI: http://eprints.soton.ac.uk/id/eprint/398733
ISSN: 1570-0844
PURE UUID: c1f60eff-1258-421b-b649-bf95b085b4e6
ORCID for Elena Simperl: ORCID iD orcid.org/0000-0003-1722-947X

Catalogue record

Date deposited: 02 Aug 2016 08:20
Last modified: 15 Mar 2024 05:46

Export record

Altmetrics

Contributors

Author: Qiong Bu
Author: Elena Simperl ORCID iD
Author: Sergej Zerr
Author: Yunjia Li

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×