T-REx: A large scale alignment of natural language with knowledge base triples
T-REx: A large scale alignment of natural language with knowledge base triples
Alignments between natural language and Knowledge Base (KB) triples are an essential prerequisite for training machine learning approaches employed in a variety of Natural Language Processing problems. These include Relation Extraction, KB Population, Question Answering and Natural Language Generation from KB triples. Available datasets that provide those alignments are plagued by significant shortcomings – they are of limited size, they exhibit a restricted predicate coverage, and/or they are of unreported quality. To alleviate these shortcomings, we present T-REx, a dataset of large scale alignments between Wikipedia abstracts and Wikidata triples. T-REx consists of 11 million triples aligned with 3.09 million Wikipedia abstracts (6.2 million sentences). T-REx is two orders of magnitude larger than the largest available alignments dataset and covers 2.5 times more predicates. Additionally, we stress the quality of this language resource thanks to an extensive crowdsourcing evaluation. T-REx is publicly available at: https://w3id.org/t-rex.
3448-3452
Elsahar, Hady
04528e31-9e9e-4de3-99ce-b6221889e912
Vougiouklis, Pavlos
4cd0a8f1-c5e2-4ba2-8dcd-753db616b215
Remaci, Arslen
7cae9d16-bd76-40db-b073-af50703be74d
Gravier, Christophe
3d1a8495-afbd-4a61-b19b-a00036d4e74b
Hare, Jonathon
65ba2cda-eaaf-4767-a325-cd845504e5a9
Simperl, Elena
40261ae4-c58c-48e4-b78b-5187b10e4f67
Laforest, Frederique
f61f682e-55a5-4626-a8d6-52aa2f3809d6
2019
Elsahar, Hady
04528e31-9e9e-4de3-99ce-b6221889e912
Vougiouklis, Pavlos
4cd0a8f1-c5e2-4ba2-8dcd-753db616b215
Remaci, Arslen
7cae9d16-bd76-40db-b073-af50703be74d
Gravier, Christophe
3d1a8495-afbd-4a61-b19b-a00036d4e74b
Hare, Jonathon
65ba2cda-eaaf-4767-a325-cd845504e5a9
Simperl, Elena
40261ae4-c58c-48e4-b78b-5187b10e4f67
Laforest, Frederique
f61f682e-55a5-4626-a8d6-52aa2f3809d6
Elsahar, Hady, Vougiouklis, Pavlos, Remaci, Arslen, Gravier, Christophe, Hare, Jonathon, Simperl, Elena and Laforest, Frederique
(2019)
T-REx: A large scale alignment of natural language with knowledge base triples.
In LREC 2018 - 11th International Conference on Language Resources and Evaluation.
.
Record type:
Conference or Workshop Item
(Paper)
Abstract
Alignments between natural language and Knowledge Base (KB) triples are an essential prerequisite for training machine learning approaches employed in a variety of Natural Language Processing problems. These include Relation Extraction, KB Population, Question Answering and Natural Language Generation from KB triples. Available datasets that provide those alignments are plagued by significant shortcomings – they are of limited size, they exhibit a restricted predicate coverage, and/or they are of unreported quality. To alleviate these shortcomings, we present T-REx, a dataset of large scale alignments between Wikipedia abstracts and Wikidata triples. T-REx consists of 11 million triples aligned with 3.09 million Wikipedia abstracts (6.2 million sentences). T-REx is two orders of magnitude larger than the largest available alignments dataset and covers 2.5 times more predicates. Additionally, we stress the quality of this language resource thanks to an extensive crowdsourcing evaluation. T-REx is publicly available at: https://w3id.org/t-rex.
Text
632
- Author's Original
More information
Submitted date: 2 October 2017
Accepted/In Press date: 13 December 2017
Published date: 2019
Identifiers
Local EPrints ID: 417557
URI: http://eprints.soton.ac.uk/id/eprint/417557
PURE UUID: 0e5eae34-650d-4829-bb16-c37e0acac81b
Catalogue record
Date deposited: 02 Feb 2018 17:31
Last modified: 16 Mar 2024 03:50
Export record
Contributors
Author:
Hady Elsahar
Author:
Pavlos Vougiouklis
Author:
Arslen Remaci
Author:
Christophe Gravier
Author:
Jonathon Hare
Author:
Frederique Laforest
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics