The University of Southampton
University of Southampton Institutional Repository

Deep learning provenance data integration: a practical approach

Deep learning provenance data integration: a practical approach
Deep learning provenance data integration: a practical approach

A Deep Learning (DL) life cycle involves several data transformations, such as performing data pre-processing, defining datasets to train and test a deep neural network (DNN), and training and evaluating the DL model. Choosing a final model requires DL model selection, which involves analyzing data from several training configurations (e.g. hyperparameters and DNN architectures). Tracing training data back to pre-processing operations can provide insights into the model selection step. Provenance is a natural solution to represent data derivation of the whole DL life cycle. However, there are challenges in providing an integration of the provenance of these different steps. There are a few approaches to capturing and integrating provenance data from the DL life cycle, but they require that the same provenance capture solution is used along all the steps, which can limit interoperability and flexibility when choosing the DL environment. Therefore, in this work, we present a prototype for provenance data integration using different capture solutions. We show use cases where the integrated provenance from pre-processing and training steps can show how data pre-processing decisions influenced the model selection. Experiments were performed using real-world datasets to train a DNN and provided evidence of the integration between the considered steps, answering queries such as how the data used to train a model that achieved a specific result was processed.

Data Pre-processing, Deep Learning, Provenance
1542-1550
Association for Computing Machinery
Pina, Débora
b0dcb47a-f69d-4029-934b-1f90d1f17559
Chapman, Adriane
721b7321-8904-4be2-9b01-876c430743f1
De Oliveira, Daniel
3e65ad9c-0b13-4e0d-ac92-189d1eeef681
Mattoso, Marta
9ebce479-9752-440d-b2fa-35e525966401
Ding, Ying
Tang, Jie
Sequeda, Juan
Pina, Débora
b0dcb47a-f69d-4029-934b-1f90d1f17559
Chapman, Adriane
721b7321-8904-4be2-9b01-876c430743f1
De Oliveira, Daniel
3e65ad9c-0b13-4e0d-ac92-189d1eeef681
Mattoso, Marta
9ebce479-9752-440d-b2fa-35e525966401
Ding, Ying
Tang, Jie
Sequeda, Juan

Pina, Débora, Chapman, Adriane, De Oliveira, Daniel and Mattoso, Marta (2023) Deep learning provenance data integration: a practical approach. Ding, Ying, Tang, Jie and Sequeda, Juan (eds.) In WWW '23 Companion: Companion Proceedings of the ACM Web Conference 2023. Association for Computing Machinery. pp. 1542-1550 . (doi:10.1145/3543873.3587561).

Record type: Conference or Workshop Item (Paper)

Abstract

A Deep Learning (DL) life cycle involves several data transformations, such as performing data pre-processing, defining datasets to train and test a deep neural network (DNN), and training and evaluating the DL model. Choosing a final model requires DL model selection, which involves analyzing data from several training configurations (e.g. hyperparameters and DNN architectures). Tracing training data back to pre-processing operations can provide insights into the model selection step. Provenance is a natural solution to represent data derivation of the whole DL life cycle. However, there are challenges in providing an integration of the provenance of these different steps. There are a few approaches to capturing and integrating provenance data from the DL life cycle, but they require that the same provenance capture solution is used along all the steps, which can limit interoperability and flexibility when choosing the DL environment. Therefore, in this work, we present a prototype for provenance data integration using different capture solutions. We show use cases where the integrated provenance from pre-processing and training steps can show how data pre-processing decisions influenced the model selection. Experiments were performed using real-world datasets to train a DNN and provided evidence of the integration between the considered steps, answering queries such as how the data used to train a model that achieved a specific result was processed.

This record has no associated files available for download.

More information

e-pub ahead of print date: 30 April 2023
Published date: 30 April 2023
Additional Information: Funding Information: This work was partially funded by EPSRC (EP/SO28366/1), FAPERJ, CNPq, and the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.
Venue - Dates: ACM Web Conference 2023, , Austin, United States, 2023-04-30 - 2023-05-04
Keywords: Data Pre-processing, Deep Learning, Provenance

Identifiers

Local EPrints ID: 484837
URI: http://eprints.soton.ac.uk/id/eprint/484837
PURE UUID: 88413d2a-cb24-421b-b96c-91390e2d0cfd
ORCID for Adriane Chapman: ORCID iD orcid.org/0000-0002-3814-2587

Catalogue record

Date deposited: 22 Nov 2023 17:54
Last modified: 17 Mar 2024 03:46

Export record

Altmetrics

Contributors

Author: Débora Pina
Author: Adriane Chapman ORCID iD
Author: Daniel De Oliveira
Author: Marta Mattoso
Editor: Ying Ding
Editor: Jie Tang
Editor: Juan Sequeda

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×