The University of Southampton
University of Southampton Institutional Repository

Structuring the world’s knowledge: socio-technical processes and data quality in Wikidata

Structuring the world’s knowledge: socio-technical processes and data quality in Wikidata
Structuring the world’s knowledge: socio-technical processes and data quality in Wikidata
Wikidata is a collaborative knowledge graph by the Wikimedia Foundation which has undergone an impressive growth since its launch in 2012: it has gathered a user pool of almost two hundred thousand editors, who have contribute data about more than 50 million entities. In the fashion of other Wikimedia projects, it is completely bottom-up, i.e. everything within the knowledge graph is created and maintained by its users.

These features have drawn the attention of a growing number of researchers and practitioners from several fields. Nevertheless, research about collaboration processes in Wikidata is still scarce. This thesis addresses this gap by analysing the socio-technical fabric of Wikidata and how that affects the quality of its data. In particular, it makes a threefold contribution: (i.) it evaluates two previously uncovered aspects of the quality of Wikidata, i.e. provenance and its ontology; (ii.) it is the first to investigate the effects of algorithmic contributions, i.e. bots, on Wikidata quality; (iii.) it looks at emerging editor activity patterns in Wikidata and their effects on outcome quality.

Our findings show that bots are important for the quality of the knowledge graph, albeit their work needs to be continuously controlled since they are potentially able to introduce different sorts of errors at a large scale. Regarding human editors, a more diverse user pool—in terms of tenure and focus of activity—seems to be associated to higher quality. Finally, two roles emerge from the editing patterns of Wikidata users, leaders and contributors. Leaders perform more edits and have a more prominent role within the community. They are also more involved in the maintenance of the Wikidata schema, their activity being positively related to the growth of its taxonomy.

This thesis contributes to the understanding of collaborative processes and data quality in Wikidata. Further studies should be carried out in order to confirm whether and to what extent its insights are generalisable to other collaborative knowledge engineering platforms.
University of Southampton
Piscopo, Alessandro
c4a3c65a-bd85-4bfa-926b-8a2228da127d
Piscopo, Alessandro
c4a3c65a-bd85-4bfa-926b-8a2228da127d
Simperl, Elena
40261ae4-c58c-48e4-b78b-5187b10e4f67

Piscopo, Alessandro (2019) Structuring the world’s knowledge: socio-technical processes and data quality in Wikidata. University of Southampton, Doctoral Thesis, 210pp.

Record type: Thesis (Doctoral)

Abstract

Wikidata is a collaborative knowledge graph by the Wikimedia Foundation which has undergone an impressive growth since its launch in 2012: it has gathered a user pool of almost two hundred thousand editors, who have contribute data about more than 50 million entities. In the fashion of other Wikimedia projects, it is completely bottom-up, i.e. everything within the knowledge graph is created and maintained by its users.

These features have drawn the attention of a growing number of researchers and practitioners from several fields. Nevertheless, research about collaboration processes in Wikidata is still scarce. This thesis addresses this gap by analysing the socio-technical fabric of Wikidata and how that affects the quality of its data. In particular, it makes a threefold contribution: (i.) it evaluates two previously uncovered aspects of the quality of Wikidata, i.e. provenance and its ontology; (ii.) it is the first to investigate the effects of algorithmic contributions, i.e. bots, on Wikidata quality; (iii.) it looks at emerging editor activity patterns in Wikidata and their effects on outcome quality.

Our findings show that bots are important for the quality of the knowledge graph, albeit their work needs to be continuously controlled since they are potentially able to introduce different sorts of errors at a large scale. Regarding human editors, a more diverse user pool—in terms of tenure and focus of activity—seems to be associated to higher quality. Finally, two roles emerge from the editing patterns of Wikidata users, leaders and contributors. Leaders perform more edits and have a more prominent role within the community. They are also more involved in the maintenance of the Wikidata schema, their activity being positively related to the growth of its taxonomy.

This thesis contributes to the understanding of collaborative processes and data quality in Wikidata. Further studies should be carried out in order to confirm whether and to what extent its insights are generalisable to other collaborative knowledge engineering platforms.

Text
Final thesis - Version of Record
Available under License University of Southampton Thesis Licence.
Download (13MB)

More information

Published date: October 2019

Identifiers

Local EPrints ID: 438873
URI: http://eprints.soton.ac.uk/id/eprint/438873
PURE UUID: 8c6a448f-dba4-4749-9c78-aadc91741f5f
ORCID for Alessandro Piscopo: ORCID iD orcid.org/0000-0002-0362-4826
ORCID for Elena Simperl: ORCID iD orcid.org/0000-0003-1722-947X

Catalogue record

Date deposited: 26 Mar 2020 17:30
Last modified: 16 Mar 2024 05:40

Export record

Contributors

Author: Alessandro Piscopo ORCID iD
Thesis advisor: Elena Simperl ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×