Read Me File for Dataset Supporting the University of Southampton Doctoral Thesis "Improvement of Biomedical Dataset Search Through the Integration of Provenance." Dataset DOI: https://doi.org/10.5258/SOTON/D3656 ReadMe Author: Abdullah Almunyashiri, University of Southampton 0000-0002-7343-6468 Date of data collection: 07/2023 - 08/2024 This dataset supports the University of Southampton doctoral thesis “Improvement of Biomedical Dataset Search Through the Integration of Provenance.” The dataset contains two files: Prompts_experiment This file contains the data presented in Chapter 6. It includes the proposed prompts for extracting provenance information from publications, along with the results of testing these prompts. Scalability_experiment This file contains the data presented in Chapter 6 relating to the scalability of the extractor. The experiment assessed scalability by evaluating the extractor’s performance as the dataset size (i.e., number of files) increased. Several experiments were conducted to measure two performance metrics: cost and response time, based on dataset size. Information about geographic location of data collection: All data was collected at the University of Southampton, U.K The experimental protocol was approved by the University of Southampton (ERGO/FEPS/92985). All research was performed in accordance with the relevant guidelines and regulations. Date that the file was created: 05/09/2025