Structuring the world’s knowledge: socio-technical processes and data quality in Wikidata
Structuring the world’s knowledge: socio-technical processes and data quality in Wikidata
Wikidata is a collaborative knowledge graph by the Wikimedia Foundation which has undergone an impressive growth since its launch in 2012: it has gathered a user pool of almost two hundred thousand editors, who have contribute data about more than 50 million entities. In the fashion of other Wikimedia projects, it is completely bottom-up, i.e. everything within the knowledge graph is created and maintained by its users.
These features have drawn the attention of a growing number of researchers and practitioners from several fields. Nevertheless, research about collaboration processes in Wikidata is still scarce. This thesis addresses this gap by analysing the socio-technical fabric of Wikidata and how that affects the quality of its data. In particular, it makes a threefold contribution: (i.) it evaluates two previously uncovered aspects of the quality of Wikidata, i.e. provenance and its ontology; (ii.) it is the first to investigate the effects of algorithmic contributions, i.e. bots, on Wikidata quality; (iii.) it looks at emerging editor activity patterns in Wikidata and their effects on outcome quality.
Our findings show that bots are important for the quality of the knowledge graph, albeit their work needs to be continuously controlled since they are potentially able to introduce different sorts of errors at a large scale. Regarding human editors, a more diverse user pool—in terms of tenure and focus of activity—seems to be associated to higher quality. Finally, two roles emerge from the editing patterns of Wikidata users, leaders and contributors. Leaders perform more edits and have a more prominent role within the community. They are also more involved in the maintenance of the Wikidata schema, their activity being positively related to the growth of its taxonomy.
This thesis contributes to the understanding of collaborative processes and data quality in Wikidata. Further studies should be carried out in order to confirm whether and to what extent its insights are generalisable to other collaborative knowledge engineering platforms.
University of Southampton
Piscopo, Alessandro
c4a3c65a-bd85-4bfa-926b-8a2228da127d
October 2019
Piscopo, Alessandro
c4a3c65a-bd85-4bfa-926b-8a2228da127d
Simperl, Elena
40261ae4-c58c-48e4-b78b-5187b10e4f67
Piscopo, Alessandro
(2019)
Structuring the world’s knowledge: socio-technical processes and data quality in Wikidata.
University of Southampton, Doctoral Thesis, 210pp.
Record type:
Thesis
(Doctoral)
Abstract
Wikidata is a collaborative knowledge graph by the Wikimedia Foundation which has undergone an impressive growth since its launch in 2012: it has gathered a user pool of almost two hundred thousand editors, who have contribute data about more than 50 million entities. In the fashion of other Wikimedia projects, it is completely bottom-up, i.e. everything within the knowledge graph is created and maintained by its users.
These features have drawn the attention of a growing number of researchers and practitioners from several fields. Nevertheless, research about collaboration processes in Wikidata is still scarce. This thesis addresses this gap by analysing the socio-technical fabric of Wikidata and how that affects the quality of its data. In particular, it makes a threefold contribution: (i.) it evaluates two previously uncovered aspects of the quality of Wikidata, i.e. provenance and its ontology; (ii.) it is the first to investigate the effects of algorithmic contributions, i.e. bots, on Wikidata quality; (iii.) it looks at emerging editor activity patterns in Wikidata and their effects on outcome quality.
Our findings show that bots are important for the quality of the knowledge graph, albeit their work needs to be continuously controlled since they are potentially able to introduce different sorts of errors at a large scale. Regarding human editors, a more diverse user pool—in terms of tenure and focus of activity—seems to be associated to higher quality. Finally, two roles emerge from the editing patterns of Wikidata users, leaders and contributors. Leaders perform more edits and have a more prominent role within the community. They are also more involved in the maintenance of the Wikidata schema, their activity being positively related to the growth of its taxonomy.
This thesis contributes to the understanding of collaborative processes and data quality in Wikidata. Further studies should be carried out in order to confirm whether and to what extent its insights are generalisable to other collaborative knowledge engineering platforms.
Text
Final thesis
- Version of Record
More information
Published date: October 2019
Identifiers
Local EPrints ID: 438873
URI: http://eprints.soton.ac.uk/id/eprint/438873
PURE UUID: 8c6a448f-dba4-4749-9c78-aadc91741f5f
Catalogue record
Date deposited: 26 Mar 2020 17:30
Last modified: 16 Mar 2024 05:40
Export record
Contributors
Author:
Alessandro Piscopo
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics