Learning to generate Wikipedia summaries for underserved languages from Wikidata

While Wikipedia exists in 287 languages, its content is unevenly distributed among them. In this work, we investigate the generation of open domain Wikipedia summaries in underserved languages using structured data from Wikidata. To this end, we propose a neural network architecture equipped with copy actions that learns to generate single-sentence and comprehensible textual summaries from Wikidata triples. We demonstrate the effectiveness of the proposed approach by evaluating it against a set of baselines on two languages of different natures: Arabic, a morphological rich language with a larger vocabulary than English, and Esperanto, a constructed language known for its easy acquisition.

Kaffee, Lucie-Aimée

8975c12f-9033-47ed-a2eb-b674b707c2ac

Elsahar, Hady

04528e31-9e9e-4de3-99ce-b6221889e912

Vougiouklis, Pavlos

4cd0a8f1-c5e2-4ba2-8dcd-753db616b215

Gravier, Christophe

3d1a8495-afbd-4a61-b19b-a00036d4e74b

Laforest, Frederique

f61f682e-55a5-4626-a8d6-52aa2f3809d6

Hare, Jonathon

65ba2cda-eaaf-4767-a325-cd845504e5a9

Simperl, Elena

40261ae4-c58c-48e4-b78b-5187b10e4f67

Kaffee, Lucie-Aimée

8975c12f-9033-47ed-a2eb-b674b707c2ac

Elsahar, Hady

04528e31-9e9e-4de3-99ce-b6221889e912

Vougiouklis, Pavlos

4cd0a8f1-c5e2-4ba2-8dcd-753db616b215

Gravier, Christophe

3d1a8495-afbd-4a61-b19b-a00036d4e74b

Laforest, Frederique

f61f682e-55a5-4626-a8d6-52aa2f3809d6

Hare, Jonathon

65ba2cda-eaaf-4767-a325-cd845504e5a9

Simperl, Elena

40261ae4-c58c-48e4-b78b-5187b10e4f67

Kaffee, Lucie-Aimée, Elsahar, Hady, Vougiouklis, Pavlos, Gravier, Christophe, Laforest, Frederique, Hare, Jonathon and Simperl, Elena (2018) Learning to generate Wikipedia summaries for underserved languages from Wikidata. 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, , New Orleans, United States. 01 - 06 Jun 2018. 6 pp . (In Press)

Record type: Conference or Workshop Item (Paper)

Abstract

Text

NAACL_Short_Textual_ArticlePlaceholder - Accepted Manuscript

Restricted to Registered users only

Download (313kB)

Request a copy

More information

Accepted/In Press date: 2018

Venue - Dates: 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, , New Orleans, United States, 2018-06-01 - 2018-06-06

Related URLs: