Learning to generate Wikipedia summaries for underserved languages from Wikidata
Learning to generate Wikipedia summaries for underserved languages from Wikidata
While Wikipedia exists in 287 languages, its content is unevenly distributed among them. In this work, we investigate the generation of open domain Wikipedia summaries in underserved languages using structured data from Wikidata. To this end, we propose a neural network architecture equipped with copy actions that learns to generate single-sentence and comprehensible textual summaries from Wikidata triples. We demonstrate the effectiveness of the proposed approach by evaluating it against a set of baselines on two languages of different natures: Arabic, a morphological rich language with a larger vocabulary than English, and Esperanto, a constructed language known for its easy acquisition.
Kaffee, Lucie-Aimée
8975c12f-9033-47ed-a2eb-b674b707c2ac
Elsahar, Hady
04528e31-9e9e-4de3-99ce-b6221889e912
Vougiouklis, Pavlos
4cd0a8f1-c5e2-4ba2-8dcd-753db616b215
Gravier, Christophe
3d1a8495-afbd-4a61-b19b-a00036d4e74b
Laforest, Frederique
f61f682e-55a5-4626-a8d6-52aa2f3809d6
Hare, Jonathon
65ba2cda-eaaf-4767-a325-cd845504e5a9
Simperl, Elena
40261ae4-c58c-48e4-b78b-5187b10e4f67
Kaffee, Lucie-Aimée
8975c12f-9033-47ed-a2eb-b674b707c2ac
Elsahar, Hady
04528e31-9e9e-4de3-99ce-b6221889e912
Vougiouklis, Pavlos
4cd0a8f1-c5e2-4ba2-8dcd-753db616b215
Gravier, Christophe
3d1a8495-afbd-4a61-b19b-a00036d4e74b
Laforest, Frederique
f61f682e-55a5-4626-a8d6-52aa2f3809d6
Hare, Jonathon
65ba2cda-eaaf-4767-a325-cd845504e5a9
Simperl, Elena
40261ae4-c58c-48e4-b78b-5187b10e4f67
Kaffee, Lucie-Aimée, Elsahar, Hady, Vougiouklis, Pavlos, Gravier, Christophe, Laforest, Frederique, Hare, Jonathon and Simperl, Elena
(2018)
Learning to generate Wikipedia summaries for underserved languages from Wikidata.
16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, , New Orleans, United States.
01 - 06 Jun 2018.
6 pp
.
(In Press)
Record type:
Conference or Workshop Item
(Paper)
Abstract
While Wikipedia exists in 287 languages, its content is unevenly distributed among them. In this work, we investigate the generation of open domain Wikipedia summaries in underserved languages using structured data from Wikidata. To this end, we propose a neural network architecture equipped with copy actions that learns to generate single-sentence and comprehensible textual summaries from Wikidata triples. We demonstrate the effectiveness of the proposed approach by evaluating it against a set of baselines on two languages of different natures: Arabic, a morphological rich language with a larger vocabulary than English, and Esperanto, a constructed language known for its easy acquisition.
Text
NAACL_Short_Textual_ArticlePlaceholder
- Accepted Manuscript
Restricted to Registered users only
Request a copy
More information
Accepted/In Press date: 2018
Venue - Dates:
16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, , New Orleans, United States, 2018-06-01 - 2018-06-06
Identifiers
Local EPrints ID: 419728
URI: http://eprints.soton.ac.uk/id/eprint/419728
PURE UUID: 145efe15-0f94-4fd1-ab8a-3ec63c054dc0
Catalogue record
Date deposited: 20 Apr 2018 16:30
Last modified: 16 Mar 2024 03:50
Export record
Contributors
Author:
Lucie-Aimée Kaffee
Author:
Hady Elsahar
Author:
Pavlos Vougiouklis
Author:
Christophe Gravier
Author:
Frederique Laforest
Author:
Jonathon Hare
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics