Using RDF graph provenance to efficiently propagate SPARQL updates

Naja, Iman (2019) Using RDF graph provenance to efficiently propagate SPARQL updates. University of Southampton, Doctoral Thesis, 216pp.

Record type: Thesis (Doctoral)

Abstract

On the Semantic Web, information is published as machine-readable graphs expressed as RDF triples. Information consumers may combine and repackage that information as derived graphs which are based on the originally published source graphs. In addition, the formal semantics of RDF and OWL permit inference, by which reasoners generate entailed graphs: derived graphs containing newly inferred information. The dynamic nature of information presents a challenge when dealing with derived or inferred information; if a source graph changes, any graphs that are derived from it must be updated in order to preserve their integrity. However, such recomputation of derived graphs can be expensive. This is analogous to the view update problem in databases, where changes in source data affect materialised views. Common approaches to this problem use the Delete and Re-Derive (DRed) algorithm to perform incremental view materialisation. To minimise the resources needed to propagate source graph updates to derived and entailed graphs, we propose to use the provenance of those graphs to guide their recomputation. The provenance of a graph is the documentation of the history of that graph. Provenance is a key requirement in a range of Web applications, and to that end the W3C has endorsed the PROV data model and ontology for the representation of provenance on the Web as RDF graphs. However, provenance may be applied at different granularities, which has significant cost implications; a naïve application of DRed to the graph rederivation problem which individually tracked the provenance of the triples which comprise each graph would generate a provenance graph much larger than the original source graphs.

In this thesis, we present RGPROV, a light-weight extension to the PROV ontology for representing RDF graph creation and updates. RGPROV allows us to understand the dependencies that a derived graph has on its source graphs without the need to document the provenance of individual triples, and facilitates the propagation of graph updates to derived graphs. Additionally, we present a modification to the DRed algorithm that enables the efficient propagation of updates to entailed graphs. By making use of RGPROV, we enable partial updates to be made to the entailed graphs without the need for triple-level provenance, which reduces the need for complete recomputation but results in an identical entailed graph, while using fewer resources. In order to evaluate our approach, we developed a provenance-aware extension to and reimplementation of the EvoGen benchmark for evolving RDF graphs, itself based on the commonly-used LUBM benchmark for RDF storage and SPARQL query engines.

Text

Using RDF Graph Provenance to Eciently Propagate SPARQL Updates - Version of Record

Available under License University of Southampton Thesis Licence.

Download (9MB)