The University of Southampton
University of Southampton Institutional Repository

Recording and using provenance in a protein compressibility experiment

Recording and using provenance in a protein compressibility experiment
Recording and using provenance in a protein compressibility experiment
Very large scale computations are now becoming routinely used as a methodology to undertake scientific research. In this context, ‘provenance systems’ are regarded as the equivalent of the scientist’s logbook for in silico experimentation: provenance captures the documentation of the process that led to some result. Using a protein compressibility analysis application, we derive a set of generic use cases for a provenance system. In order to support these, we address the following fundamental questions: what is provenance? how to record it? what is the performance impact for grid execution? what is the performance of reasoning? In doing so, we define a technology-independent notion of provenance that captures interactions between components, internal component information and grouping of interactions, so as to allow us to analyse and reason about the execution of scientific processes. In order to support persistent provenance in heterogeneous applications, we introduce a separate provenance store, in which provenance documentation can be stored, archived and queried independently of the technology used to run the application. Through a series of practical tests, we evaluate the performance impact of such a provenance system. In summary, we demonstrate that provenance recording overhead of our prototype system remains under 10% of execution time, and we show that the recorded information successfully supports our use cases in a performant manner.
Provenance, Grid, protein compressibility
Groth, Paul
427b9eca-c4dd-45c1-be04-3c91bb327345
Miles, Simon
76c81b8e-1ca1-4d6d-ace3-922f03df97e0
Fang, Weijan
7e756dad-bcc4-4144-a568-4ec0874078cd
Wong, Sylvia C.
cbe4ba03-5f30-44ef-9f77-d9c5c0a26ee2
Zauner, Klaus-Peter
c8b22dbd-10e6-43d8-813b-0766f985cc97
Moreau, Luc
033c63dd-3fe9-4040-849f-dfccbe0406f8
Groth, Paul
427b9eca-c4dd-45c1-be04-3c91bb327345
Miles, Simon
76c81b8e-1ca1-4d6d-ace3-922f03df97e0
Fang, Weijan
7e756dad-bcc4-4144-a568-4ec0874078cd
Wong, Sylvia C.
cbe4ba03-5f30-44ef-9f77-d9c5c0a26ee2
Zauner, Klaus-Peter
c8b22dbd-10e6-43d8-813b-0766f985cc97
Moreau, Luc
033c63dd-3fe9-4040-849f-dfccbe0406f8

Groth, Paul, Miles, Simon, Fang, Weijan, Wong, Sylvia C., Zauner, Klaus-Peter and Moreau, Luc (2005) Recording and using provenance in a protein compressibility experiment. The 14th IEEE International Symposium on High Performance Distributed Computing (HPDC-14), Research Triangle Park, North Carolina. 24 - 27 Jul 2005.

Record type: Conference or Workshop Item (Paper)

Abstract

Very large scale computations are now becoming routinely used as a methodology to undertake scientific research. In this context, ‘provenance systems’ are regarded as the equivalent of the scientist’s logbook for in silico experimentation: provenance captures the documentation of the process that led to some result. Using a protein compressibility analysis application, we derive a set of generic use cases for a provenance system. In order to support these, we address the following fundamental questions: what is provenance? how to record it? what is the performance impact for grid execution? what is the performance of reasoning? In doing so, we define a technology-independent notion of provenance that captures interactions between components, internal component information and grouping of interactions, so as to allow us to analyse and reason about the execution of scientific processes. In order to support persistent provenance in heterogeneous applications, we introduce a separate provenance store, in which provenance documentation can be stored, archived and queried independently of the technology used to run the application. Through a series of practical tests, we evaluate the performance impact of such a provenance system. In summary, we demonstrate that provenance recording overhead of our prototype system remains under 10% of execution time, and we show that the recorded information successfully supports our use cases in a performant manner.

Text
hpdc05.pdf - Accepted Manuscript
Download (166kB)

More information

Published date: 2005
Additional Information: Event Dates: 24-27 July, 2005
Venue - Dates: The 14th IEEE International Symposium on High Performance Distributed Computing (HPDC-14), Research Triangle Park, North Carolina, 2005-07-24 - 2005-07-27
Keywords: Provenance, Grid, protein compressibility
Organisations: Web & Internet Science, Agents, Interactions & Complexity

Identifiers

Local EPrints ID: 260910
URI: http://eprints.soton.ac.uk/id/eprint/260910
PURE UUID: 53ce0f67-53f7-4529-91d9-41edc938fc59
ORCID for Luc Moreau: ORCID iD orcid.org/0000-0002-3494-120X

Catalogue record

Date deposited: 24 May 2005
Last modified: 14 Mar 2024 06:45

Export record

Contributors

Author: Paul Groth
Author: Simon Miles
Author: Weijan Fang
Author: Sylvia C. Wong
Author: Klaus-Peter Zauner
Author: Luc Moreau ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×