The University of Southampton
University of Southampton Institutional Repository

Provenance network analytics: An approach to data analytics using data provenance

Provenance network analytics: An approach to data analytics using data provenance
Provenance network analytics: An approach to data analytics using data provenance
Provenance network analytics is a novel data analytics approach that helps infer properties of data, such as quality or importance, from their provenance. Instead of analysing application data, which are typically domain-dependent, it analyses the data's provenance as represented using the World Wide Web Consortium's domain-agnostic PROV data model. Specifically, the approach proposes a number of network metrics for provenance data and applies established machine learning techniques over such metrics to build predictive models for some key properties of data. Applying this method to the provenance of real-world data from three different applications, we show that it can successfully identify the owners of provenance documents, assess the quality of crowdsourced data, and identify instructions from chat messages in an alternate-reality game with high levels of accuracy. By so doing, we demonstrate the different ways the proposed provenance network metrics can be used in analysing data, providing the foundation for provenance-based data analytics.
data provenance, data analytics, network metrics, graph classification
1384-5810
Huynh, Trung Dong
ddea6cf3-5a82-4c99-8883-7c31cf22dd36
Ebden, Mark
f46be90b-365e-4ea3-909a-4b92e4287f68
Fischer, Joel
a320ad79-0fb5-464b-9eac-f74918b5ea68
Roberts, Stephen
fef5d01c-92bd-44cf-93f0-923ec24f8875
Moreau, Luc
033c63dd-3fe9-4040-849f-dfccbe0406f8
Huynh, Trung Dong
ddea6cf3-5a82-4c99-8883-7c31cf22dd36
Ebden, Mark
f46be90b-365e-4ea3-909a-4b92e4287f68
Fischer, Joel
a320ad79-0fb5-464b-9eac-f74918b5ea68
Roberts, Stephen
fef5d01c-92bd-44cf-93f0-923ec24f8875
Moreau, Luc
033c63dd-3fe9-4040-849f-dfccbe0406f8

Huynh, Trung Dong, Ebden, Mark, Fischer, Joel, Roberts, Stephen and Moreau, Luc (2018) Provenance network analytics: An approach to data analytics using data provenance. Data Mining and Knowledge Discovery. (doi:10.1007/s10618-017-0549-3).

Record type: Article

Abstract

Provenance network analytics is a novel data analytics approach that helps infer properties of data, such as quality or importance, from their provenance. Instead of analysing application data, which are typically domain-dependent, it analyses the data's provenance as represented using the World Wide Web Consortium's domain-agnostic PROV data model. Specifically, the approach proposes a number of network metrics for provenance data and applies established machine learning techniques over such metrics to build predictive models for some key properties of data. Applying this method to the provenance of real-world data from three different applications, we show that it can successfully identify the owners of provenance documents, assess the quality of crowdsourced data, and identify instructions from chat messages in an alternate-reality game with high levels of accuracy. By so doing, we demonstrate the different ways the proposed provenance network metrics can be used in analysing data, providing the foundation for provenance-based data analytics.

Text
huynh_provanalytics_dmkd_v3.pdf - Accepted Manuscript
Download (385kB)
Text
Provenance - Version of Record
Available under License Creative Commons Attribution.
Download (1MB)

More information

Accepted/In Press date: 26 December 2017
e-pub ahead of print date: 15 February 2018
Keywords: data provenance, data analytics, network metrics, graph classification

Identifiers

Local EPrints ID: 416917
URI: http://eprints.soton.ac.uk/id/eprint/416917
ISSN: 1384-5810
PURE UUID: 52ba5997-3388-4899-8f24-45ab1a5f6921
ORCID for Trung Dong Huynh: ORCID iD orcid.org/0000-0003-4937-2473
ORCID for Luc Moreau: ORCID iD orcid.org/0000-0002-3494-120X

Catalogue record

Date deposited: 15 Jan 2018 17:30
Last modified: 16 Mar 2024 06:06

Export record

Altmetrics

Contributors

Author: Trung Dong Huynh ORCID iD
Author: Mark Ebden
Author: Joel Fischer
Author: Stephen Roberts
Author: Luc Moreau ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×