Deep learning provenance data integration: a practical approach

A Deep Learning (DL) life cycle involves several data transformations, such as performing data pre-processing, defining datasets to train and test a deep neural network (DNN), and training and evaluating the DL model. Choosing a final model requires DL model selection, which involves analyzing data from several training configurations (e.g. hyperparameters and DNN architectures). Tracing training data back to pre-processing operations can provide insights into the model selection step. Provenance is a natural solution to represent data derivation of the whole DL life cycle. However, there are challenges in providing an integration of the provenance of these different steps. There are a few approaches to capturing and integrating provenance data from the DL life cycle, but they require that the same provenance capture solution is used along all the steps, which can limit interoperability and flexibility when choosing the DL environment. Therefore, in this work, we present a prototype for provenance data integration using different capture solutions. We show use cases where the integrated provenance from pre-processing and training steps can show how data pre-processing decisions influenced the model selection. Experiments were performed using real-world datasets to train a DNN and provided evidence of the integration between the considered steps, answering queries such as how the data used to train a model that achieved a specific result was processed.

Data Pre-processing, Deep Learning, Provenance

10.1145/3543873.3587561

1542-1550

Association for Computing Machinery

Pina, Débora

b0dcb47a-f69d-4029-934b-1f90d1f17559

Chapman, Adriane

721b7321-8904-4be2-9b01-876c430743f1

De Oliveira, Daniel

3e65ad9c-0b13-4e0d-ac92-189d1eeef681

Mattoso, Marta

9ebce479-9752-440d-b2fa-35e525966401

Ding, Ying

Tang, Jie

Sequeda, Juan

30 April 2023

Pina, Débora

b0dcb47a-f69d-4029-934b-1f90d1f17559

Chapman, Adriane

721b7321-8904-4be2-9b01-876c430743f1

De Oliveira, Daniel

3e65ad9c-0b13-4e0d-ac92-189d1eeef681

Mattoso, Marta

9ebce479-9752-440d-b2fa-35e525966401

Ding, Ying

Tang, Jie

Sequeda, Juan

Pina, Débora, Chapman, Adriane, De Oliveira, Daniel and Mattoso, Marta (2023) Deep learning provenance data integration: a practical approach. Ding, Ying, Tang, Jie and Sequeda, Juan (eds.) In WWW '23 Companion: Companion Proceedings of the ACM Web Conference 2023. Association for Computing Machinery. pp. 1542-1550 . (doi:10.1145/3543873.3587561).

Record type: Conference or Workshop Item (Paper)

Abstract

This record has no associated files available for download.

More information

e-pub ahead of print date: 30 April 2023

Published date: 30 April 2023

Additional Information: Funding Information: This work was partially funded by EPSRC (EP/SO28366/1), FAPERJ, CNPq, and the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.

Venue - Dates: ACM Web Conference 2023, , Austin, United States, 2023-04-30 - 2023-05-04

Keywords: Data Pre-processing, Deep Learning, Provenance