The University of Southampton
University of Southampton Institutional Repository

Aligning texts and knowledge bases with semantic sentence simplification

Aligning texts and knowledge bases with semantic sentence simplification
Aligning texts and knowledge bases with semantic sentence simplification
Finding the natural language equivalent of structured data is both a challenging and promising task. In particular, an efficient alignment of knowledge bases with texts would benefit many applications, including natural language generation, information retrieval and text simplification. In this paper, we present an approach to build a dataset of triples aligned with equivalent sentences written in natural language. Our approach consists of three main steps. First, target sentences are annotated automatically with knowledge base (KB) concepts and instances. The triples linking these elements in the KB are extracted as candidate facts to be aligned with the annotated sentence. Second, we use textual mentions referring to the subject and object of these facts to semantically simplify the target sentence via crowdsourcing. Third, the sentences provided by different contributors are post-processed to keep only the most relevant simplifications for the alignment with KB facts. We present different filtering methods, and share the constructed datasets in the public domain. These datasets contain 1050 sentences aligned with 1885 triples. They can be used to train natural language generators as well as semantic or contextual text simplifiers.
29-36
Mrabet, Yassine
7f194bd5-ab6f-4371-b03b-8afe8e0b7034
Vougiouklis, Pavlos
4cd0a8f1-c5e2-4ba2-8dcd-753db616b215
Kilicoglu, Halil
3e451da8-9a6b-4b06-a06f-a34552930ccf
Gardent, Claire
40e0be0c-2a9e-4774-9768-06ae7a3ab120
Demner-Fushman, Dina
2d8700ee-dfee-49cb-958d-e0ef412ed4a1
Hare, Jonathon
65ba2cda-eaaf-4767-a325-cd845504e5a9
Simperl, Elena
40261ae4-c58c-48e4-b78b-5187b10e4f67
Mrabet, Yassine
7f194bd5-ab6f-4371-b03b-8afe8e0b7034
Vougiouklis, Pavlos
4cd0a8f1-c5e2-4ba2-8dcd-753db616b215
Kilicoglu, Halil
3e451da8-9a6b-4b06-a06f-a34552930ccf
Gardent, Claire
40e0be0c-2a9e-4774-9768-06ae7a3ab120
Demner-Fushman, Dina
2d8700ee-dfee-49cb-958d-e0ef412ed4a1
Hare, Jonathon
65ba2cda-eaaf-4767-a325-cd845504e5a9
Simperl, Elena
40261ae4-c58c-48e4-b78b-5187b10e4f67

Mrabet, Yassine, Vougiouklis, Pavlos, Kilicoglu, Halil, Gardent, Claire, Demner-Fushman, Dina, Hare, Jonathon and Simperl, Elena (2016) Aligning texts and knowledge bases with semantic sentence simplification At 2nd International Workshop on Natural Language Generation and the Semantic Web (WebNLG 2016), United Kingdom. 05 - 08 Sep 2016. , pp. 29-36.

Record type: Conference or Workshop Item (Paper)

Abstract

Finding the natural language equivalent of structured data is both a challenging and promising task. In particular, an efficient alignment of knowledge bases with texts would benefit many applications, including natural language generation, information retrieval and text simplification. In this paper, we present an approach to build a dataset of triples aligned with equivalent sentences written in natural language. Our approach consists of three main steps. First, target sentences are annotated automatically with knowledge base (KB) concepts and instances. The triples linking these elements in the KB are extracted as candidate facts to be aligned with the annotated sentence. Second, we use textual mentions referring to the subject and object of these facts to semantically simplify the target sentence via crowdsourcing. Third, the sentences provided by different contributors are post-processed to keep only the most relevant simplifications for the alignment with KB facts. We present different filtering methods, and share the constructed datasets in the public domain. These datasets contain 1050 sentences aligned with 1885 triples. They can be used to train natural language generators as well as semantic or contextual text simplifiers.

PDF 06.pdf - Accepted Manuscript
Download (197kB)

More information

Accepted/In Press date: 12 July 2016
e-pub ahead of print date: 6 September 2016
Venue - Dates: 2nd International Workshop on Natural Language Generation and the Semantic Web (WebNLG 2016), United Kingdom, 2016-09-05 - 2016-09-08
Organisations: Web & Internet Science, Vision, Learning and Control

Identifiers

Local EPrints ID: 401445
URI: http://eprints.soton.ac.uk/id/eprint/401445
PURE UUID: d33df7ad-8721-4857-9acc-2b093df2a908
ORCID for Jonathon Hare: ORCID iD orcid.org/0000-0003-2921-4283
ORCID for Elena Simperl: ORCID iD orcid.org/0000-0003-1722-947X

Catalogue record

Date deposited: 17 Oct 2016 13:20
Last modified: 17 Jul 2017 18:02

Export record

Contributors

Author: Yassine Mrabet
Author: Halil Kilicoglu
Author: Claire Gardent
Author: Dina Demner-Fushman
Author: Jonathon Hare ORCID iD
Author: Elena Simperl ORCID iD

University divisions

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×