Using RDF graph provenance to efficiently propagate SPARQL updates
Using RDF graph provenance to efficiently propagate SPARQL updates
On the Semantic Web, information is published as machine-readable graphs expressed as RDF triples. Information consumers may combine and repackage that information as derived graphs which are based on the originally published source graphs. In addition, the formal semantics of RDF and OWL permit inference, by which reasoners generate entailed graphs: derived graphs containing newly inferred information. The dynamic nature of information presents a challenge when dealing with derived or inferred information; if a source graph changes, any graphs that are derived from it must be updated in order to preserve their integrity. However, such recomputation of derived graphs can be expensive. This is analogous to the view update problem in databases, where changes in source data affect materialised views. Common approaches to this problem use the Delete and Re-Derive (DRed) algorithm to perform incremental view materialisation. To minimise the resources needed to propagate source graph updates to derived and entailed graphs, we propose to use the provenance of those graphs to guide their recomputation. The provenance of a graph is the documentation of the history of that graph. Provenance is a key requirement in a range of Web applications, and to that end the W3C has endorsed the PROV data model and ontology for the representation of provenance on the Web as RDF graphs. However, provenance may be applied at different granularities, which has significant cost implications; a naïve application of DRed to the graph rederivation problem which individually tracked the provenance of the triples which comprise each graph would generate a provenance graph much larger than the original source graphs.
In this thesis, we present RGPROV, a light-weight extension to the PROV ontology for representing RDF graph creation and updates. RGPROV allows us to understand the dependencies that a derived graph has on its source graphs without the need to document the provenance of individual triples, and facilitates the propagation of graph updates to derived graphs. Additionally, we present a modification to the DRed algorithm that enables the efficient propagation of updates to entailed graphs. By making use of RGPROV, we enable partial updates to be made to the entailed graphs without the need for triple-level provenance, which reduces the need for complete recomputation but results in an identical entailed graph, while using fewer resources. In order to evaluate our approach, we developed a provenance-aware extension to and reimplementation of the EvoGen benchmark for evolving RDF graphs, itself based on the commonly-used LUBM benchmark for RDF storage and SPARQL query engines.
University of Southampton
Naja, Iman
f25d3ac3-a618-4aaf-bbc4-dc7b7241f616
December 2019
Naja, Iman
f25d3ac3-a618-4aaf-bbc4-dc7b7241f616
Gibbins, Nicholas
98efd447-4aa7-411c-86d1-955a612eceac
Naja, Iman
(2019)
Using RDF graph provenance to efficiently propagate SPARQL updates.
University of Southampton, Doctoral Thesis, 216pp.
Record type:
Thesis
(Doctoral)
Abstract
On the Semantic Web, information is published as machine-readable graphs expressed as RDF triples. Information consumers may combine and repackage that information as derived graphs which are based on the originally published source graphs. In addition, the formal semantics of RDF and OWL permit inference, by which reasoners generate entailed graphs: derived graphs containing newly inferred information. The dynamic nature of information presents a challenge when dealing with derived or inferred information; if a source graph changes, any graphs that are derived from it must be updated in order to preserve their integrity. However, such recomputation of derived graphs can be expensive. This is analogous to the view update problem in databases, where changes in source data affect materialised views. Common approaches to this problem use the Delete and Re-Derive (DRed) algorithm to perform incremental view materialisation. To minimise the resources needed to propagate source graph updates to derived and entailed graphs, we propose to use the provenance of those graphs to guide their recomputation. The provenance of a graph is the documentation of the history of that graph. Provenance is a key requirement in a range of Web applications, and to that end the W3C has endorsed the PROV data model and ontology for the representation of provenance on the Web as RDF graphs. However, provenance may be applied at different granularities, which has significant cost implications; a naïve application of DRed to the graph rederivation problem which individually tracked the provenance of the triples which comprise each graph would generate a provenance graph much larger than the original source graphs.
In this thesis, we present RGPROV, a light-weight extension to the PROV ontology for representing RDF graph creation and updates. RGPROV allows us to understand the dependencies that a derived graph has on its source graphs without the need to document the provenance of individual triples, and facilitates the propagation of graph updates to derived graphs. Additionally, we present a modification to the DRed algorithm that enables the efficient propagation of updates to entailed graphs. By making use of RGPROV, we enable partial updates to be made to the entailed graphs without the need for triple-level provenance, which reduces the need for complete recomputation but results in an identical entailed graph, while using fewer resources. In order to evaluate our approach, we developed a provenance-aware extension to and reimplementation of the EvoGen benchmark for evolving RDF graphs, itself based on the commonly-used LUBM benchmark for RDF storage and SPARQL query engines.
Text
Using RDF Graph Provenance to Eciently Propagate SPARQL Updates
- Version of Record
More information
Published date: December 2019
Identifiers
Local EPrints ID: 437665
URI: http://eprints.soton.ac.uk/id/eprint/437665
PURE UUID: 2c83bd2d-2cc0-4137-be7a-66fd39a7d92c
Catalogue record
Date deposited: 10 Feb 2020 17:31
Last modified: 17 Mar 2024 02:47
Export record
Contributors
Author:
Iman Naja
Thesis advisor:
Nicholas Gibbins
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics