Deep learning provenance data integration: a practical approach
Deep learning provenance data integration: a practical approach
A Deep Learning (DL) life cycle involves several data transformations, such as performing data pre-processing, defining datasets to train and test a deep neural network (DNN), and training and evaluating the DL model. Choosing a final model requires DL model selection, which involves analyzing data from several training configurations (e.g. hyperparameters and DNN architectures). Tracing training data back to pre-processing operations can provide insights into the model selection step. Provenance is a natural solution to represent data derivation of the whole DL life cycle. However, there are challenges in providing an integration of the provenance of these different steps. There are a few approaches to capturing and integrating provenance data from the DL life cycle, but they require that the same provenance capture solution is used along all the steps, which can limit interoperability and flexibility when choosing the DL environment. Therefore, in this work, we present a prototype for provenance data integration using different capture solutions. We show use cases where the integrated provenance from pre-processing and training steps can show how data pre-processing decisions influenced the model selection. Experiments were performed using real-world datasets to train a DNN and provided evidence of the integration between the considered steps, answering queries such as how the data used to train a model that achieved a specific result was processed.
Data Pre-processing, Deep Learning, Provenance
1542-1550
Association for Computing Machinery
Pina, Débora
b0dcb47a-f69d-4029-934b-1f90d1f17559
Chapman, Adriane
721b7321-8904-4be2-9b01-876c430743f1
De Oliveira, Daniel
3e65ad9c-0b13-4e0d-ac92-189d1eeef681
Mattoso, Marta
9ebce479-9752-440d-b2fa-35e525966401
30 April 2023
Pina, Débora
b0dcb47a-f69d-4029-934b-1f90d1f17559
Chapman, Adriane
721b7321-8904-4be2-9b01-876c430743f1
De Oliveira, Daniel
3e65ad9c-0b13-4e0d-ac92-189d1eeef681
Mattoso, Marta
9ebce479-9752-440d-b2fa-35e525966401
Pina, Débora, Chapman, Adriane, De Oliveira, Daniel and Mattoso, Marta
(2023)
Deep learning provenance data integration: a practical approach.
Ding, Ying, Tang, Jie and Sequeda, Juan
(eds.)
In WWW '23 Companion: Companion Proceedings of the ACM Web Conference 2023.
Association for Computing Machinery.
.
(doi:10.1145/3543873.3587561).
Record type:
Conference or Workshop Item
(Paper)
Abstract
A Deep Learning (DL) life cycle involves several data transformations, such as performing data pre-processing, defining datasets to train and test a deep neural network (DNN), and training and evaluating the DL model. Choosing a final model requires DL model selection, which involves analyzing data from several training configurations (e.g. hyperparameters and DNN architectures). Tracing training data back to pre-processing operations can provide insights into the model selection step. Provenance is a natural solution to represent data derivation of the whole DL life cycle. However, there are challenges in providing an integration of the provenance of these different steps. There are a few approaches to capturing and integrating provenance data from the DL life cycle, but they require that the same provenance capture solution is used along all the steps, which can limit interoperability and flexibility when choosing the DL environment. Therefore, in this work, we present a prototype for provenance data integration using different capture solutions. We show use cases where the integrated provenance from pre-processing and training steps can show how data pre-processing decisions influenced the model selection. Experiments were performed using real-world datasets to train a DNN and provided evidence of the integration between the considered steps, answering queries such as how the data used to train a model that achieved a specific result was processed.
This record has no associated files available for download.
More information
e-pub ahead of print date: 30 April 2023
Published date: 30 April 2023
Additional Information:
Funding Information:
This work was partially funded by EPSRC (EP/SO28366/1), FAPERJ, CNPq, and the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.
Venue - Dates:
ACM Web Conference 2023, , Austin, United States, 2023-04-30 - 2023-05-04
Keywords:
Data Pre-processing, Deep Learning, Provenance
Identifiers
Local EPrints ID: 484837
URI: http://eprints.soton.ac.uk/id/eprint/484837
PURE UUID: 88413d2a-cb24-421b-b96c-91390e2d0cfd
Catalogue record
Date deposited: 22 Nov 2023 17:54
Last modified: 17 Mar 2024 03:46
Export record
Altmetrics
Contributors
Author:
Débora Pina
Author:
Daniel De Oliveira
Author:
Marta Mattoso
Editor:
Ying Ding
Editor:
Jie Tang
Editor:
Juan Sequeda
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics