Connecting Scientific Data to Scientific Experiments with Provenance
Connecting Scientific Data to Scientific Experiments with Provenance
As scientific workflows, and the data they operate on, grow in size and complexity, the task of defining how those workflows should execute (which resources they should use, where those resources should be in preparation for processing etc.) becomes proportionally more difficult. While 'workflow compilers', such as Pegasus, aid greatly in reducing this burden, a further problem arises: as specifying the details of execution is now automatic, a workflow's results are harder to interpret, as they are in part due to the specifics of execution. By automating the steps between the original experiment design and its results, we lose the connection between them, making results harder to interpret. To reconnect the scientific data with the original experiment, we argue that scientists should have access to the full provenance of their data, including not only parameters, input data and intermediary results, but also the abstract experiment, refined into a concrete execution by the 'workflow compiler'. In this paper, we describe our preliminary work on adapting Pegasus to capture the process of workflow refinement in the PASOA provenance system.
179-186
Miles, Simon
76c81b8e-1ca1-4d6d-ace3-922f03df97e0
Deelman, Ewa
a4e70674-2af5-465e-9d86-989ceccd3f2d
Groth, Paul
427b9eca-c4dd-45c1-be04-3c91bb327345
Vahi, Karan
b22cc478-6a69-43d9-b135-3760e546ba40
Mehta, Gaurang
91a7b27e-e7e0-467d-a791-4fbdcdb55f6f
Moreau, Luc
033c63dd-3fe9-4040-849f-dfccbe0406f8
December 2007
Miles, Simon
76c81b8e-1ca1-4d6d-ace3-922f03df97e0
Deelman, Ewa
a4e70674-2af5-465e-9d86-989ceccd3f2d
Groth, Paul
427b9eca-c4dd-45c1-be04-3c91bb327345
Vahi, Karan
b22cc478-6a69-43d9-b135-3760e546ba40
Mehta, Gaurang
91a7b27e-e7e0-467d-a791-4fbdcdb55f6f
Moreau, Luc
033c63dd-3fe9-4040-849f-dfccbe0406f8
Miles, Simon, Deelman, Ewa, Groth, Paul, Vahi, Karan, Mehta, Gaurang and Moreau, Luc
(2007)
Connecting Scientific Data to Scientific Experiments with Provenance.
Proceedings of the third IEEE International Conference on e-Science and Grid Computing (e-Science'07).
.
Record type:
Conference or Workshop Item
(Paper)
Abstract
As scientific workflows, and the data they operate on, grow in size and complexity, the task of defining how those workflows should execute (which resources they should use, where those resources should be in preparation for processing etc.) becomes proportionally more difficult. While 'workflow compilers', such as Pegasus, aid greatly in reducing this burden, a further problem arises: as specifying the details of execution is now automatic, a workflow's results are harder to interpret, as they are in part due to the specifics of execution. By automating the steps between the original experiment design and its results, we lose the connection between them, making results harder to interpret. To reconnect the scientific data with the original experiment, we argue that scientists should have access to the full provenance of their data, including not only parameters, input data and intermediary results, but also the abstract experiment, refined into a concrete execution by the 'workflow compiler'. In this paper, we describe our preliminary work on adapting Pegasus to capture the process of workflow refinement in the PASOA provenance system.
Text
escience07.pdf
- Accepted Manuscript
More information
Published date: December 2007
Venue - Dates:
Proceedings of the third IEEE International Conference on e-Science and Grid Computing (e-Science'07), 2007-12-01
Organisations:
Web & Internet Science
Identifiers
Local EPrints ID: 271188
URI: http://eprints.soton.ac.uk/id/eprint/271188
PURE UUID: 2e5f85e4-c44c-457f-9ca6-99f1558a761f
Catalogue record
Date deposited: 27 May 2010 10:35
Last modified: 14 Mar 2024 09:25
Export record
Contributors
Author:
Simon Miles
Author:
Ewa Deelman
Author:
Paul Groth
Author:
Karan Vahi
Author:
Gaurang Mehta
Author:
Luc Moreau
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics