Using natural language generation to bootstrap missing Wikipedia articles: A human-centric perspective
Using natural language generation to bootstrap missing Wikipedia articles: A human-centric perspective
Nowadays natural language generation (NLG) is used in everything from news reporting and chatbots to social media management. Recent advances in machine learning have made it possible to train NLG systems that seek to achieve human-level performance in text writing and summarisation. In this paper, we propose such a system in the context of Wikipedia and evaluate it with Wikipedia readers and editors. Our solution builds upon the ArticlePlaceholder, a tool used in $14$ under-resourced Wikipedia language versions, which displays structured data from the Wikidata knowledge base on empty Wikipedia pages. We train a neural network to generate an introductory sentence from the Wikidata triples shown by the ArticlePlaceholder, and explore how Wikipedia users engage with it. The evaluation, which includes an automatic, a judgement-based, and a task-based component, shows that the summary sentences score well in terms of perceived fluency and appropriateness for Wikipedia, and can help editors bootstrap new articles. It also hints at several potential implications of using NLG solutions in Wikipedia at large, including content quality, trust in technology, and algorithmic transparency.
Kaffee, Lucie-Aimée
8975c12f-9033-47ed-a2eb-b674b707c2ac
Vougiouklis, Pavlos
4cd0a8f1-c5e2-4ba2-8dcd-753db616b215
Simperl, Elena
40261ae4-c58c-48e4-b78b-5187b10e4f67
18 February 2021
Kaffee, Lucie-Aimée
8975c12f-9033-47ed-a2eb-b674b707c2ac
Vougiouklis, Pavlos
4cd0a8f1-c5e2-4ba2-8dcd-753db616b215
Simperl, Elena
40261ae4-c58c-48e4-b78b-5187b10e4f67
Kaffee, Lucie-Aimée, Vougiouklis, Pavlos and Simperl, Elena
(2021)
Using natural language generation to bootstrap missing Wikipedia articles: A human-centric perspective.
Semantic Web.
Abstract
Nowadays natural language generation (NLG) is used in everything from news reporting and chatbots to social media management. Recent advances in machine learning have made it possible to train NLG systems that seek to achieve human-level performance in text writing and summarisation. In this paper, we propose such a system in the context of Wikipedia and evaluate it with Wikipedia readers and editors. Our solution builds upon the ArticlePlaceholder, a tool used in $14$ under-resourced Wikipedia language versions, which displays structured data from the Wikidata knowledge base on empty Wikipedia pages. We train a neural network to generate an introductory sentence from the Wikidata triples shown by the ArticlePlaceholder, and explore how Wikipedia users engage with it. The evaluation, which includes an automatic, a judgement-based, and a task-based component, shows that the summary sentences score well in terms of perceived fluency and appropriateness for Wikipedia, and can help editors bootstrap new articles. It also hints at several potential implications of using NLG solutions in Wikipedia at large, including content quality, trust in technology, and algorithmic transparency.
This record has no associated files available for download.
More information
Published date: 18 February 2021
Identifiers
Local EPrints ID: 449718
URI: http://eprints.soton.ac.uk/id/eprint/449718
ISSN: 1570-0844
PURE UUID: a863b39d-6f75-4d4c-8a91-8a1c67ed89af
Catalogue record
Date deposited: 11 Jun 2021 16:33
Last modified: 16 Mar 2024 11:06
Export record
Contributors
Author:
Lucie-Aimée Kaffee
Author:
Pavlos Vougiouklis
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics