The University of Southampton
University of Southampton Institutional Repository

Neural generation of textual summaries from knowledge base triples

Neural generation of textual summaries from knowledge base triples
Neural generation of textual summaries from knowledge base triples
Most people need textual or visual interfaces in order to make sense of Semantic Web data. In this thesis, we investigate the problem of generating natural language summaries for structured data encoded as triples using neural networks.

We propose an end-to-end trainable architecture that encodes the information from a set of triples into a vector of fixed dimensionality and generates a textual summary by conditioning the output on this encoded vector. In order to both train and evaluate the performance of our approach, we explore different methodologies for building the required data-to-text corpora. We initially focus our attention on the generation of biographies. Using methods for both automatic and human evaluation, we demonstrated that our technique is capable of scaling to domains with challenging vocabulary sizes of over 400k words.

Given the promising results of our approach in biographies, we explore its applicability in the generation of open-domain Wikipedia summaries in two under-resourced languages, Arabic and Esperanto. We propose an adaptation of our original encoder-decoder architecture that outperforms a set of strong baselines of different nature. Furthermore, we conducted a set of community studies in order to measure the usability of the generated content by Wikipedia readers and editors. The targeted communities ranked our generated text close to the expected standards of Wikipedia. In addition, we found that the editors are likely to reuse a large portion of the generated summaries, thus, emphasizing the usefulness of our approach to the involved communities.

Finally, we extend the original model with a pointer mechanism that enables it to jointly learn to verbalise in a different number of ways the content from the triples while retaining the ability to generate regular words from a fixed target vocabulary. We evaluate performance with a dataset encompassing the entirety of English Wikipedia. Results from both automatic and human evaluation highlight the superiority of the latter approach compared to our original encoder-decoder architecture and a set of competitive baselines.
University of Southampton
Vougiouklis, Pavlos
4cd0a8f1-c5e2-4ba2-8dcd-753db616b215
Vougiouklis, Pavlos
4cd0a8f1-c5e2-4ba2-8dcd-753db616b215
Simperl, Elena
40261ae4-c58c-48e4-b78b-5187b10e4f67

Vougiouklis, Pavlos (2019) Neural generation of textual summaries from knowledge base triples. University of Southampton, Doctoral Thesis, 149pp.

Record type: Thesis (Doctoral)

Abstract

Most people need textual or visual interfaces in order to make sense of Semantic Web data. In this thesis, we investigate the problem of generating natural language summaries for structured data encoded as triples using neural networks.

We propose an end-to-end trainable architecture that encodes the information from a set of triples into a vector of fixed dimensionality and generates a textual summary by conditioning the output on this encoded vector. In order to both train and evaluate the performance of our approach, we explore different methodologies for building the required data-to-text corpora. We initially focus our attention on the generation of biographies. Using methods for both automatic and human evaluation, we demonstrated that our technique is capable of scaling to domains with challenging vocabulary sizes of over 400k words.

Given the promising results of our approach in biographies, we explore its applicability in the generation of open-domain Wikipedia summaries in two under-resourced languages, Arabic and Esperanto. We propose an adaptation of our original encoder-decoder architecture that outperforms a set of strong baselines of different nature. Furthermore, we conducted a set of community studies in order to measure the usability of the generated content by Wikipedia readers and editors. The targeted communities ranked our generated text close to the expected standards of Wikipedia. In addition, we found that the editors are likely to reuse a large portion of the generated summaries, thus, emphasizing the usefulness of our approach to the involved communities.

Finally, we extend the original model with a pointer mechanism that enables it to jointly learn to verbalise in a different number of ways the content from the triples while retaining the ability to generate regular words from a fixed target vocabulary. We evaluate performance with a dataset encompassing the entirety of English Wikipedia. Results from both automatic and human evaluation highlight the superiority of the latter approach compared to our original encoder-decoder architecture and a set of competitive baselines.

Text
Final Thesis - Version of Record
Available under License University of Southampton Thesis Licence.
Download (2MB)

More information

Published date: January 2019

Identifiers

Local EPrints ID: 428045
URI: https://eprints.soton.ac.uk/id/eprint/428045
PURE UUID: b9bcbfca-40fe-484e-b21a-4b469e7b66e6
ORCID for Elena Simperl: ORCID iD orcid.org/0000-0003-1722-947X

Catalogue record

Date deposited: 07 Feb 2019 17:30
Last modified: 14 Mar 2019 01:35

Export record

Contributors

Author: Pavlos Vougiouklis
Thesis advisor: Elena Simperl ORCID iD

University divisions

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of https://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×