Mind the (Language) Gap: Generation of Multilingual Wikipedia Summaries from Wikidata for ArticlePlaceholders
Mind the (Language) Gap: Generation of Multilingual Wikipedia Summaries from Wikidata for ArticlePlaceholders
While Wikipedia exists in 287 languages, its content is unevenly distributed among them. It is therefore of utmost social and cultural importance to focus efforts on languages whose speakers only have access to limited Wikipedia content. We investigate supporting communities by generating summaries for Wikipedia articles in underserved languages, given structured data as an input.
We focus on an important support for such summaries: ArticlePlaceholders, a dynamically generated content pages in underserved Wikipedias. They enable native speakers to access existing information in Wikidata. To extend those ArticlePlaceholders, we provide a system, which processes the triples of the KB as they are provided by the ArticlePlaceholder, and generate a comprehensible textual summary. This data-driven approach is employed with the goal of understanding how well it matches the communities' needs on two underserved languages on the Web: Arabic, a language with a big community with disproportionate access to knowledge online, and Esperanto, an easily-acquainted, artificial language whose Wikipedia content is maintained by a small but devoted community. With the help of the Arabic and Esperanto Wikipedians, we conduct a study which evaluates not only the quality of the generated text, but also the usefulness of our end-system to any underserved Wikipedia version.
319-334
Kaffee, Lucie-Aimée
8975c12f-9033-47ed-a2eb-b674b707c2ac
Elsahar, Hady
04528e31-9e9e-4de3-99ce-b6221889e912
Vougiouklis, Pavlos
4cd0a8f1-c5e2-4ba2-8dcd-753db616b215
Gravier, Christophe
3d1a8495-afbd-4a61-b19b-a00036d4e74b
Laforest, Frederique
f61f682e-55a5-4626-a8d6-52aa2f3809d6
Hare, Jonathon
65ba2cda-eaaf-4767-a325-cd845504e5a9
Simperl, Elena
40261ae4-c58c-48e4-b78b-5187b10e4f67
Kaffee, Lucie-Aimée
8975c12f-9033-47ed-a2eb-b674b707c2ac
Elsahar, Hady
04528e31-9e9e-4de3-99ce-b6221889e912
Vougiouklis, Pavlos
4cd0a8f1-c5e2-4ba2-8dcd-753db616b215
Gravier, Christophe
3d1a8495-afbd-4a61-b19b-a00036d4e74b
Laforest, Frederique
f61f682e-55a5-4626-a8d6-52aa2f3809d6
Hare, Jonathon
65ba2cda-eaaf-4767-a325-cd845504e5a9
Simperl, Elena
40261ae4-c58c-48e4-b78b-5187b10e4f67
Kaffee, Lucie-Aimée, Elsahar, Hady, Vougiouklis, Pavlos, Gravier, Christophe, Laforest, Frederique, Hare, Jonathon and Simperl, Elena
(2018)
Mind the (Language) Gap: Generation of Multilingual Wikipedia Summaries from Wikidata for ArticlePlaceholders.
In The Semantic Web.
vol. 10843,
Springer.
.
(doi:10.1007/978-3-319-93417-4_21).
Record type:
Conference or Workshop Item
(Paper)
Abstract
While Wikipedia exists in 287 languages, its content is unevenly distributed among them. It is therefore of utmost social and cultural importance to focus efforts on languages whose speakers only have access to limited Wikipedia content. We investigate supporting communities by generating summaries for Wikipedia articles in underserved languages, given structured data as an input.
We focus on an important support for such summaries: ArticlePlaceholders, a dynamically generated content pages in underserved Wikipedias. They enable native speakers to access existing information in Wikidata. To extend those ArticlePlaceholders, we provide a system, which processes the triples of the KB as they are provided by the ArticlePlaceholder, and generate a comprehensible textual summary. This data-driven approach is employed with the goal of understanding how well it matches the communities' needs on two underserved languages on the Web: Arabic, a language with a big community with disproportionate access to knowledge online, and Esperanto, an easily-acquainted, artificial language whose Wikipedia content is maintained by a small but devoted community. With the help of the Arabic and Esperanto Wikipedians, we conduct a study which evaluates not only the quality of the generated text, but also the usefulness of our end-system to any underserved Wikipedia version.
Text
ESWC_Language_Gap
- Accepted Manuscript
More information
Accepted/In Press date: 16 March 2018
e-pub ahead of print date: 3 June 2018
Venue - Dates:
Extended Semantic Web Conference 2018, , Heraklion, Crete, Greece, 2018-06-03 - 2018-06-07
Identifiers
Local EPrints ID: 419727
URI: http://eprints.soton.ac.uk/id/eprint/419727
PURE UUID: 7915013a-3d4e-4410-a353-261b98acf9eb
Catalogue record
Date deposited: 20 Apr 2018 16:30
Last modified: 16 Mar 2024 06:54
Export record
Altmetrics
Contributors
Author:
Lucie-Aimée Kaffee
Author:
Hady Elsahar
Author:
Pavlos Vougiouklis
Author:
Christophe Gravier
Author:
Frederique Laforest
Author:
Jonathon Hare
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics