The domain agnostic generation of natural language explanations from provenance graphs

In a data-driven world, being able to record from where data was derived, and by whom is key. The way to represent this information, provenance, on the Web has been standardised by the World Wide Web Consortium as PROV. Furthermore, once provenance has been recorded, it is often necessary to be able to present it back to users. In the state-of-the-art, the interfaces to such provenance tend to be diagrammatic, or rely on very application-specific template-based natural language generation. Both of these approaches have their drawbacks, motivating the search for techniques for generating natural language explanations from domain-generic provenance graphs. This work presents several contributions to the state-of-the-art in this regard. Firstly it presents a novel template-based architecture for natural language generation. This is followed by the novel application of set-cover optimisation techniques to the challenge of sentence selection. Thirdly, this work extends previous research into the role of URIs for lexicalising Linked Data resources, making use of the specific nature of PROV instance data to inform the heuristics used. Fourthly, these techniques are then evaluated in a user study demonstrating that they improve upon the state-of-the-art across the three dimensions of grammatical correctness, fluency, and comprehensibility. This evaluation also showed that the participants preferred the sentences generated using these techniques 56.4% of the time. Following on from these advances, an investigation is conducted into how to structure larger natural language explanations of provenance graphs. This is done by inviting a number of provenance experts to describe a sequence of provenance graphs presented diagrammatically, and analysing the way they approach this task. This reveals that the responses of the experts correlated strongly with the visual layout of the diagrams, and also that the experts were split as to whether to structure those explanations in a chronological or anti-chronological order. Finally, a further study was conducted to investigate how chronology affects the perceived quality of the generated natural language explanations, revealing that in aggregate the participants considered the chronological ordering to be more logical. This dissertation concludes with a summary of the contributions made to the state-of-the-art, as well as by proposing a number of possible areas for future research.

University of Southampton

Richardson, Darren Paul

f55f06e8-4f92-4399-b365-558b4e64d65d

June 2018

Richardson, Darren Paul

f55f06e8-4f92-4399-b365-558b4e64d65d

Moreau, Luc

033c63dd-3fe9-4040-849f-dfccbe0406f8

Smart, Paul

cd8a3dbf-d963-4009-80fb-76ecc93579df

Shadbolt, Nigel

5c5acdf4-ad42-49b6-81fe-e9db58c2caf7

Ramchurn, Sarvapali

1d62ae2a-a498-444e-912d-a6082d3aaea3

Costanza, Enrico

0868f119-c42e-4b5f-905f-fe98c1beeded

Yang, Yang

4f250291-4405-49b3-a662-eb9810e00415

Popov, Igor

517af6d0-e80b-45fd-89ef-9da71a09bd6f

Hall, Wendy

11f7f8db-854c-4481-b1ae-721a51d8790c

Glaser, Hugh

df88ca22-a72f-4fb6-9784-6578737d8af4

Schraefel, Monica

ac304659-1692-47f6-b892-15113b8c929f

Millard, Ian

0cc12fdc-51b4-460a-a773-01b0975d2a71

Lalani, Zahra

fe90e47d-c220-43aa-9b61-a834a403e7a8

Smith, Daniel

8d05522d-e91e-4aa7-8972-e362e73f005c

Berners-Lee, Tim

5a589ebb-05c1-43fa-b49e-c0b70739b3dd

Correndo, Gianluca

fea0843a-6d4a-4136-8784-0d023fcde3e2

Van Kleek, Max

4d869656-cd47-4cdf-9a4f-697fa9ba4105

Alsubaie, Saad

dc2c6aee-a9af-439d-9fb8-cf5c3091b9b4

Richardson, Darren Paul (2018) The domain agnostic generation of natural language explanations from provenance graphs. University of Southampton, Doctoral Thesis, 148pp.

Record type: Thesis (Doctoral)

Abstract

Text

Final thesis - Version of Record

Available under License University of Southampton Thesis Licence.

Download (2MB)

More information

Published date: June 2018

Learn more about the Electronics & Computer Science

Identifiers

Local EPrints ID: 423465

URI: http://eprints.soton.ac.uk/id/eprint/423465

PURE UUID: 9fbaa2c3-48e3-47f4-99bb-a63d1c83d3c7

ORCID for Luc Moreau:

orcid.org/0000-0002-3494-120X

ORCID for Paul Smart:

orcid.org/0000-0001-9989-5307

ORCID for Sarvapali Ramchurn:

orcid.org/0000-0001-9686-4302

ORCID for Wendy Hall:

orcid.org/0000-0003-4327-7811

ORCID for Monica Schraefel:

orcid.org/0000-0002-9061-7957

ORCID for Gianluca Correndo:

orcid.org/0000-0003-3335-5759

Catalogue record

Date deposited: 24 Sep 2018 16:30

Last modified: 16 Mar 2024 03:44

Export record

Share this record

Share this on Facebook Share this on Twitter Share this on Weibo

Contributors

Author: Darren Paul Richardson

Thesis advisor: Luc Moreau

Thesis advisor: Paul Smart

Thesis advisor: Nigel Shadbolt

Thesis advisor: Sarvapali Ramchurn

Thesis advisor: Enrico Costanza

Thesis advisor: Yang Yang

Thesis advisor: Igor Popov

Thesis advisor: Wendy Hall

Thesis advisor: Hugh Glaser

Thesis advisor: Monica Schraefel

Thesis advisor: Ian Millard

Thesis advisor: Zahra Lalani

Thesis advisor: Daniel Smith

Thesis advisor: Tim Berners-Lee

Thesis advisor: Gianluca Correndo

Thesis advisor: Max Van Kleek

Thesis advisor: Saad Alsubaie

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Library staff additional information