The University of Southampton
University of Southampton Institutional Repository

LLMs for the post-hoc creation of provenance

LLMs for the post-hoc creation of provenance
LLMs for the post-hoc creation of provenance
Provenance information is an essential component that facilitates the reproduction of scientific experiments, the assessment of data quality, and other related tasks. However, provenance capture at observation is sometimes difficult, and post-hoc methods are needed. In this paper, we explore the ability of large language models (LLMs) to access and extract provenance information from scientific papers through a set of specially designed prompts. We then identify and suggest the most effective prompt for provenance extraction from papers. Our findings confirm the capability of ChatGPT-4 in accessing and extracting provenance information from biomedical research papers.
provenance, LLMs
562-566
Almuntashiri, Abdullah Hamed
aa118cfa-3b60-4717-9855-2816bbbb28d0
Ibáñez, Luis-Daniel
65a2e20b-74a9-427d-8c4c-2330285153ed
Chapman, Adriane
721b7321-8904-4be2-9b01-876c430743f1
Almuntashiri, Abdullah Hamed
aa118cfa-3b60-4717-9855-2816bbbb28d0
Ibáñez, Luis-Daniel
65a2e20b-74a9-427d-8c4c-2330285153ed
Chapman, Adriane
721b7321-8904-4be2-9b01-876c430743f1

Almuntashiri, Abdullah Hamed, Ibáñez, Luis-Daniel and Chapman, Adriane (2024) LLMs for the post-hoc creation of provenance. In 2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW): 16th International Workshop on Theory and Practice of Provenance. pp. 562-566 . (doi:10.1109/EuroSPW61312.2024.00068).

Record type: Conference or Workshop Item (Paper)

Abstract

Provenance information is an essential component that facilitates the reproduction of scientific experiments, the assessment of data quality, and other related tasks. However, provenance capture at observation is sometimes difficult, and post-hoc methods are needed. In this paper, we explore the ability of large language models (LLMs) to access and extract provenance information from scientific papers through a set of specially designed prompts. We then identify and suggest the most effective prompt for provenance extraction from papers. Our findings confirm the capability of ChatGPT-4 in accessing and extracting provenance information from biomedical research papers.

Text
672900a562
Restricted to Repository staff only
Request a copy

More information

Published date: 12 July 2024
Keywords: provenance, LLMs

Identifiers

Local EPrints ID: 492146
URI: http://eprints.soton.ac.uk/id/eprint/492146
PURE UUID: c661437c-be30-4017-b504-f070ac474d01
ORCID for Abdullah Hamed Almuntashiri: ORCID iD orcid.org/0000-0002-7343-6468
ORCID for Luis-Daniel Ibáñez: ORCID iD orcid.org/0000-0001-6993-0001
ORCID for Adriane Chapman: ORCID iD orcid.org/0000-0002-3814-2587

Catalogue record

Date deposited: 18 Jul 2024 16:31
Last modified: 01 Aug 2024 02:01

Export record

Altmetrics

Contributors

Author: Abdullah Hamed Almuntashiri ORCID iD
Author: Luis-Daniel Ibáñez ORCID iD
Author: Adriane Chapman ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×