The domain agnostic generation of natural language explanations from provenance graphs
The domain agnostic generation of natural language explanations from provenance graphs
In a data-driven world, being able to record from where data was derived, and by whom is key. The way to represent this information, provenance, on the Web has been standardised by the World Wide Web Consortium as PROV. Furthermore, once provenance has been recorded, it is often necessary to be able to present it back to users. In the state-of-the-art, the interfaces to such provenance tend to be diagrammatic, or rely on very application-specific template-based natural language generation. Both of these approaches have their drawbacks, motivating the search for techniques for generating natural language explanations from domain-generic provenance graphs. This work presents several contributions to the state-of-the-art in this regard. Firstly it presents a novel template-based architecture for natural language generation. This is followed by the novel application of set-cover optimisation techniques to the challenge of sentence selection. Thirdly, this work extends previous research into the role of URIs for lexicalising Linked Data resources, making use of the specific nature of PROV instance data to inform the heuristics used. Fourthly, these techniques are then evaluated in a user study demonstrating that they improve upon the state-of-the-art across the three dimensions of grammatical correctness, fluency, and comprehensibility. This evaluation also showed that the participants preferred the sentences generated using these techniques 56.4% of the time. Following on from these advances, an investigation is conducted into how to structure larger natural language explanations of provenance graphs. This is done by inviting a number of provenance experts to describe a sequence of provenance graphs presented diagrammatically, and analysing the way they approach this task. This reveals that the responses of the experts correlated strongly with the visual layout of the diagrams, and also that the experts were split as to whether to structure those explanations in a chronological or anti-chronological order. Finally, a further study was conducted to investigate how chronology affects the perceived quality of the generated natural language explanations, revealing that in aggregate the participants considered the chronological ordering to be more logical. This dissertation concludes with a summary of the contributions made to the state-of-the-art, as well as by proposing a number of possible areas for future research.
University of Southampton
Richardson, Darren Paul
f55f06e8-4f92-4399-b365-558b4e64d65d
June 2018
Richardson, Darren Paul
f55f06e8-4f92-4399-b365-558b4e64d65d
Moreau, Luc
033c63dd-3fe9-4040-849f-dfccbe0406f8
Smart, Paul
cd8a3dbf-d963-4009-80fb-76ecc93579df
Shadbolt, Nigel
5c5acdf4-ad42-49b6-81fe-e9db58c2caf7
Ramchurn, Sarvapali
1d62ae2a-a498-444e-912d-a6082d3aaea3
Costanza, Enrico
0868f119-c42e-4b5f-905f-fe98c1beeded
Yang, Yang
4f250291-4405-49b3-a662-eb9810e00415
Popov, Igor
517af6d0-e80b-45fd-89ef-9da71a09bd6f
Hall, Wendy
11f7f8db-854c-4481-b1ae-721a51d8790c
Glaser, Hugh
df88ca22-a72f-4fb6-9784-6578737d8af4
Schraefel, Monica
ac304659-1692-47f6-b892-15113b8c929f
Millard, Ian
0cc12fdc-51b4-460a-a773-01b0975d2a71
Lalani, Zahra
fe90e47d-c220-43aa-9b61-a834a403e7a8
Smith, Daniel
8d05522d-e91e-4aa7-8972-e362e73f005c
Berners-Lee, Tim
5a589ebb-05c1-43fa-b49e-c0b70739b3dd
Correndo, Gianluca
fea0843a-6d4a-4136-8784-0d023fcde3e2
Van Kleek, Max
4d869656-cd47-4cdf-9a4f-697fa9ba4105
Alsubaie, Saad
dc2c6aee-a9af-439d-9fb8-cf5c3091b9b4
Richardson, Darren Paul
(2018)
The domain agnostic generation of natural language explanations from provenance graphs.
University of Southampton, Doctoral Thesis, 148pp.
Record type:
Thesis
(Doctoral)
Abstract
In a data-driven world, being able to record from where data was derived, and by whom is key. The way to represent this information, provenance, on the Web has been standardised by the World Wide Web Consortium as PROV. Furthermore, once provenance has been recorded, it is often necessary to be able to present it back to users. In the state-of-the-art, the interfaces to such provenance tend to be diagrammatic, or rely on very application-specific template-based natural language generation. Both of these approaches have their drawbacks, motivating the search for techniques for generating natural language explanations from domain-generic provenance graphs. This work presents several contributions to the state-of-the-art in this regard. Firstly it presents a novel template-based architecture for natural language generation. This is followed by the novel application of set-cover optimisation techniques to the challenge of sentence selection. Thirdly, this work extends previous research into the role of URIs for lexicalising Linked Data resources, making use of the specific nature of PROV instance data to inform the heuristics used. Fourthly, these techniques are then evaluated in a user study demonstrating that they improve upon the state-of-the-art across the three dimensions of grammatical correctness, fluency, and comprehensibility. This evaluation also showed that the participants preferred the sentences generated using these techniques 56.4% of the time. Following on from these advances, an investigation is conducted into how to structure larger natural language explanations of provenance graphs. This is done by inviting a number of provenance experts to describe a sequence of provenance graphs presented diagrammatically, and analysing the way they approach this task. This reveals that the responses of the experts correlated strongly with the visual layout of the diagrams, and also that the experts were split as to whether to structure those explanations in a chronological or anti-chronological order. Finally, a further study was conducted to investigate how chronology affects the perceived quality of the generated natural language explanations, revealing that in aggregate the participants considered the chronological ordering to be more logical. This dissertation concludes with a summary of the contributions made to the state-of-the-art, as well as by proposing a number of possible areas for future research.
Text
Final thesis
- Version of Record
More information
Published date: June 2018
Identifiers
Local EPrints ID: 423465
URI: http://eprints.soton.ac.uk/id/eprint/423465
PURE UUID: 9fbaa2c3-48e3-47f4-99bb-a63d1c83d3c7
Catalogue record
Date deposited: 24 Sep 2018 16:30
Last modified: 16 Mar 2024 03:44
Export record
Contributors
Author:
Darren Paul Richardson
Thesis advisor:
Luc Moreau
Thesis advisor:
Paul Smart
Thesis advisor:
Nigel Shadbolt
Thesis advisor:
Sarvapali Ramchurn
Thesis advisor:
Enrico Costanza
Thesis advisor:
Yang Yang
Thesis advisor:
Igor Popov
Thesis advisor:
Hugh Glaser
Thesis advisor:
Monica Schraefel
Thesis advisor:
Ian Millard
Thesis advisor:
Zahra Lalani
Thesis advisor:
Daniel Smith
Thesis advisor:
Tim Berners-Lee
Thesis advisor:
Gianluca Correndo
Thesis advisor:
Max Van Kleek
Thesis advisor:
Saad Alsubaie
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics