T-REx: A large scale alignment of natural language with knowledge base triples

Alignments between natural language and Knowledge Base (KB) triples are an essential prerequisite for training machine learning approaches employed in a variety of Natural Language Processing problems. These include Relation Extraction, KB Population, Question Answering and Natural Language Generation from KB triples. Available datasets that provide those alignments are plagued by significant shortcomings – they are of limited size, they exhibit a restricted predicate coverage, and/or they are of unreported quality. To alleviate these shortcomings, we present T-REx, a dataset of large scale alignments between Wikipedia abstracts and Wikidata triples. T-REx consists of 11 million triples aligned with 3.09 million Wikipedia abstracts (6.2 million sentences). T-REx is two orders of magnitude larger than the largest available alignments dataset and covers 2.5 times more predicates. Additionally, we stress the quality of this language resource thanks to an extensive crowdsourcing evaluation. T-REx is publicly available at: https://w3id.org/t-rex.

3448-3452

Elsahar, Hady

04528e31-9e9e-4de3-99ce-b6221889e912

Vougiouklis, Pavlos

4cd0a8f1-c5e2-4ba2-8dcd-753db616b215

Remaci, Arslen

7cae9d16-bd76-40db-b073-af50703be74d

Gravier, Christophe

3d1a8495-afbd-4a61-b19b-a00036d4e74b

Hare, Jonathon

65ba2cda-eaaf-4767-a325-cd845504e5a9

Simperl, Elena

40261ae4-c58c-48e4-b78b-5187b10e4f67

Laforest, Frederique

f61f682e-55a5-4626-a8d6-52aa2f3809d6

2019

Elsahar, Hady

04528e31-9e9e-4de3-99ce-b6221889e912

Vougiouklis, Pavlos

4cd0a8f1-c5e2-4ba2-8dcd-753db616b215

Remaci, Arslen

7cae9d16-bd76-40db-b073-af50703be74d

Gravier, Christophe

3d1a8495-afbd-4a61-b19b-a00036d4e74b

Hare, Jonathon

65ba2cda-eaaf-4767-a325-cd845504e5a9

Simperl, Elena

40261ae4-c58c-48e4-b78b-5187b10e4f67

Laforest, Frederique

f61f682e-55a5-4626-a8d6-52aa2f3809d6

Elsahar, Hady, Vougiouklis, Pavlos, Remaci, Arslen, Gravier, Christophe, Hare, Jonathon, Simperl, Elena and Laforest, Frederique (2019) T-REx: A large scale alignment of natural language with knowledge base triples. In LREC 2018 - 11th International Conference on Language Resources and Evaluation. pp. 3448-3452 .

Record type: Conference or Workshop Item (Paper)

Abstract

Text

632 - Author's Original

Download (426kB)

More information

Submitted date: 2 October 2017

Accepted/In Press date: 13 December 2017

Published date: 2019

Learn more about the Southampton Marine and Maritime Institute Learn more about the Vision, Learning and Control Learn more about the Electronics & Computer Science

Identifiers

Local EPrints ID: 417557

URI: http://eprints.soton.ac.uk/id/eprint/417557

PURE UUID: 0e5eae34-650d-4829-bb16-c37e0acac81b

ORCID for Jonathon Hare:

orcid.org/0000-0003-2921-4283

ORCID for Elena Simperl:

orcid.org/0000-0003-1722-947X

Catalogue record

Date deposited: 02 Feb 2018 17:31

Last modified: 16 Mar 2024 03:50

Export record

Share this record

Share this on Facebook Share this on Twitter Share this on Weibo

Contributors

Author: Hady Elsahar

Author: Pavlos Vougiouklis

Author: Arslen Remaci

Author: Christophe Gravier

Author: Jonathon Hare

Author: Elena Simperl

Author: Frederique Laforest

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Library staff additional information