The University of Southampton
University of Southampton Institutional Repository

The domain agnostic generation of natural language explanations from provenance graphs

The domain agnostic generation of natural language explanations from provenance graphs
The domain agnostic generation of natural language explanations from provenance graphs
In a data-driven world, being able to record from where data was derived, and by whom is key. The way to represent this information, provenance, on the Web has been standardised by the World Wide Web Consortium as PROV. Furthermore, once provenance has been recorded, it is often necessary to be able to present it back to users. In the state-of-the-art, the interfaces to such provenance tend to be diagrammatic, or rely on very application-specific template-based natural language generation. Both of these approaches have their drawbacks, motivating the search for techniques for generating natural language explanations from domain-generic provenance graphs. This work presents several contributions to the state-of-the-art in this regard. Firstly it presents a novel template-based architecture for natural language generation. This is followed by the novel application of set-cover optimisation techniques to the challenge of sentence selection. Thirdly, this work extends previous research into the role of URIs for lexicalising Linked Data resources, making use of the specific nature of PROV instance data to inform the heuristics used. Fourthly, these techniques are then evaluated in a user study demonstrating that they improve upon the state-of-the-art across the three dimensions of grammatical correctness, fluency, and comprehensibility. This evaluation also showed that the participants preferred the sentences generated using these techniques 56.4% of the time. Following on from these advances, an investigation is conducted into how to structure larger natural language explanations of provenance graphs. This is done by inviting a number of provenance experts to describe a sequence of provenance graphs presented diagrammatically, and analysing the way they approach this task. This reveals that the responses of the experts correlated strongly with the visual layout of the diagrams, and also that the experts were split as to whether to structure those explanations in a chronological or anti-chronological order. Finally, a further study was conducted to investigate how chronology affects the perceived quality of the generated natural language explanations, revealing that in aggregate the participants considered the chronological ordering to be more logical. This dissertation concludes with a summary of the contributions made to the state-of-the-art, as well as by proposing a number of possible areas for future research.
University of Southampton
Richardson, Darren Paul
f55f06e8-4f92-4399-b365-558b4e64d65d
Richardson, Darren Paul
f55f06e8-4f92-4399-b365-558b4e64d65d
Moreau, Luc
033c63dd-3fe9-4040-849f-dfccbe0406f8
Smart, Paul
cd8a3dbf-d963-4009-80fb-76ecc93579df
Shadbolt, Nigel
5c5acdf4-ad42-49b6-81fe-e9db58c2caf7
Ramchurn, Sarvapali
1d62ae2a-a498-444e-912d-a6082d3aaea3
Costanza, Enrico
0868f119-c42e-4b5f-905f-fe98c1beeded
Yang, Yang
4f250291-4405-49b3-a662-eb9810e00415
Popov, Igor
517af6d0-e80b-45fd-89ef-9da71a09bd6f
Hall, Wendy
11f7f8db-854c-4481-b1ae-721a51d8790c
Glaser, Hugh
df88ca22-a72f-4fb6-9784-6578737d8af4
Schraefel, Monica
ac304659-1692-47f6-b892-15113b8c929f
Millard, Ian
0cc12fdc-51b4-460a-a773-01b0975d2a71
Lalani, Zahra
fe90e47d-c220-43aa-9b61-a834a403e7a8
Smith, Daniel
8d05522d-e91e-4aa7-8972-e362e73f005c
Berners-Lee, Tim
5a589ebb-05c1-43fa-b49e-c0b70739b3dd
Correndo, Gianluca
fea0843a-6d4a-4136-8784-0d023fcde3e2
Van Kleek, Max
4d869656-cd47-4cdf-9a4f-697fa9ba4105
Alsubaie, Saad
dc2c6aee-a9af-439d-9fb8-cf5c3091b9b4

Richardson, Darren Paul (2018) The domain agnostic generation of natural language explanations from provenance graphs. University of Southampton, Doctoral Thesis, 148pp.

Record type: Thesis (Doctoral)

Abstract

In a data-driven world, being able to record from where data was derived, and by whom is key. The way to represent this information, provenance, on the Web has been standardised by the World Wide Web Consortium as PROV. Furthermore, once provenance has been recorded, it is often necessary to be able to present it back to users. In the state-of-the-art, the interfaces to such provenance tend to be diagrammatic, or rely on very application-specific template-based natural language generation. Both of these approaches have their drawbacks, motivating the search for techniques for generating natural language explanations from domain-generic provenance graphs. This work presents several contributions to the state-of-the-art in this regard. Firstly it presents a novel template-based architecture for natural language generation. This is followed by the novel application of set-cover optimisation techniques to the challenge of sentence selection. Thirdly, this work extends previous research into the role of URIs for lexicalising Linked Data resources, making use of the specific nature of PROV instance data to inform the heuristics used. Fourthly, these techniques are then evaluated in a user study demonstrating that they improve upon the state-of-the-art across the three dimensions of grammatical correctness, fluency, and comprehensibility. This evaluation also showed that the participants preferred the sentences generated using these techniques 56.4% of the time. Following on from these advances, an investigation is conducted into how to structure larger natural language explanations of provenance graphs. This is done by inviting a number of provenance experts to describe a sequence of provenance graphs presented diagrammatically, and analysing the way they approach this task. This reveals that the responses of the experts correlated strongly with the visual layout of the diagrams, and also that the experts were split as to whether to structure those explanations in a chronological or anti-chronological order. Finally, a further study was conducted to investigate how chronology affects the perceived quality of the generated natural language explanations, revealing that in aggregate the participants considered the chronological ordering to be more logical. This dissertation concludes with a summary of the contributions made to the state-of-the-art, as well as by proposing a number of possible areas for future research.

Text
Final thesis - Version of Record
Available under License University of Southampton Thesis Licence.
Download (2MB)

More information

Published date: June 2018

Identifiers

Local EPrints ID: 423465
URI: http://eprints.soton.ac.uk/id/eprint/423465
PURE UUID: 9fbaa2c3-48e3-47f4-99bb-a63d1c83d3c7
ORCID for Luc Moreau: ORCID iD orcid.org/0000-0002-3494-120X
ORCID for Paul Smart: ORCID iD orcid.org/0000-0001-9989-5307
ORCID for Sarvapali Ramchurn: ORCID iD orcid.org/0000-0001-9686-4302
ORCID for Wendy Hall: ORCID iD orcid.org/0000-0003-4327-7811
ORCID for Monica Schraefel: ORCID iD orcid.org/0000-0002-9061-7957
ORCID for Gianluca Correndo: ORCID iD orcid.org/0000-0003-3335-5759

Catalogue record

Date deposited: 24 Sep 2018 16:30
Last modified: 16 Mar 2024 03:44

Export record

Contributors

Author: Darren Paul Richardson
Thesis advisor: Luc Moreau ORCID iD
Thesis advisor: Paul Smart ORCID iD
Thesis advisor: Nigel Shadbolt
Thesis advisor: Sarvapali Ramchurn ORCID iD
Thesis advisor: Enrico Costanza
Thesis advisor: Yang Yang
Thesis advisor: Igor Popov
Thesis advisor: Wendy Hall ORCID iD
Thesis advisor: Hugh Glaser
Thesis advisor: Monica Schraefel ORCID iD
Thesis advisor: Ian Millard
Thesis advisor: Zahra Lalani
Thesis advisor: Daniel Smith
Thesis advisor: Tim Berners-Lee
Thesis advisor: Gianluca Correndo ORCID iD
Thesis advisor: Max Van Kleek
Thesis advisor: Saad Alsubaie

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×