The University of Southampton
University of Southampton Institutional Repository

Using natural language generation to bootstrap missing Wikipedia articles: A human-centric perspective

Using natural language generation to bootstrap missing Wikipedia articles: A human-centric perspective
Using natural language generation to bootstrap missing Wikipedia articles: A human-centric perspective
Nowadays natural language generation (NLG) is used in everything from news reporting and chatbots to social media management. Recent advances in machine learning have made it possible to train NLG systems that seek to achieve human-level performance in text writing and summarisation. In this paper, we propose such a system in the context of Wikipedia and evaluate it with Wikipedia readers and editors. Our solution builds upon the ArticlePlaceholder, a tool used in $14$ under-resourced Wikipedia language versions, which displays structured data from the Wikidata knowledge base on empty Wikipedia pages. We train a neural network to generate an introductory sentence from the Wikidata triples shown by the ArticlePlaceholder, and explore how Wikipedia users engage with it. The evaluation, which includes an automatic, a judgement-based, and a task-based component, shows that the summary sentences score well in terms of perceived fluency and appropriateness for Wikipedia, and can help editors bootstrap new articles. It also hints at several potential implications of using NLG solutions in Wikipedia at large, including content quality, trust in technology, and algorithmic transparency.
1570-0844
Kaffee, Lucie-Aimée
8975c12f-9033-47ed-a2eb-b674b707c2ac
Vougiouklis, Pavlos
4cd0a8f1-c5e2-4ba2-8dcd-753db616b215
Simperl, Elena
40261ae4-c58c-48e4-b78b-5187b10e4f67
Kaffee, Lucie-Aimée
8975c12f-9033-47ed-a2eb-b674b707c2ac
Vougiouklis, Pavlos
4cd0a8f1-c5e2-4ba2-8dcd-753db616b215
Simperl, Elena
40261ae4-c58c-48e4-b78b-5187b10e4f67

Kaffee, Lucie-Aimée, Vougiouklis, Pavlos and Simperl, Elena (2021) Using natural language generation to bootstrap missing Wikipedia articles: A human-centric perspective. Semantic Web.

Record type: Article

Abstract

Nowadays natural language generation (NLG) is used in everything from news reporting and chatbots to social media management. Recent advances in machine learning have made it possible to train NLG systems that seek to achieve human-level performance in text writing and summarisation. In this paper, we propose such a system in the context of Wikipedia and evaluate it with Wikipedia readers and editors. Our solution builds upon the ArticlePlaceholder, a tool used in $14$ under-resourced Wikipedia language versions, which displays structured data from the Wikidata knowledge base on empty Wikipedia pages. We train a neural network to generate an introductory sentence from the Wikidata triples shown by the ArticlePlaceholder, and explore how Wikipedia users engage with it. The evaluation, which includes an automatic, a judgement-based, and a task-based component, shows that the summary sentences score well in terms of perceived fluency and appropriateness for Wikipedia, and can help editors bootstrap new articles. It also hints at several potential implications of using NLG solutions in Wikipedia at large, including content quality, trust in technology, and algorithmic transparency.

This record has no associated files available for download.

More information

Published date: 18 February 2021

Identifiers

Local EPrints ID: 449718
URI: http://eprints.soton.ac.uk/id/eprint/449718
ISSN: 1570-0844
PURE UUID: a863b39d-6f75-4d4c-8a91-8a1c67ed89af
ORCID for Lucie-Aimée Kaffee: ORCID iD orcid.org/0000-0002-1514-8505
ORCID for Elena Simperl: ORCID iD orcid.org/0000-0003-1722-947X

Catalogue record

Date deposited: 11 Jun 2021 16:33
Last modified: 16 Mar 2024 11:06

Export record

Contributors

Author: Lucie-Aimée Kaffee ORCID iD
Author: Pavlos Vougiouklis
Author: Elena Simperl ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×