The University of Southampton
University of Southampton Institutional Repository

Improvement of biomedical dataset search through the integration of provenance

Improvement of biomedical dataset search through the integration of provenance
Improvement of biomedical dataset search through the integration of provenance
Efforts to support the application of Findable, Accessible, Interoperable, and Reusable (FAIR) principles in the biomedical research domain have led to an increase in the availability of datasets online, facilitating data exchange and reuse. This application significantly enhances research reproducibility and reduces the resources required to conduct research from scratch. As public biomedical repositories proliferate, an enormous number of datasets, encompassing various types of data, have become available to biomedical researchers. However, researchers require methods and tools that assist them in searching for and discovering relevant datasets. They still face challenges when using existing search engines, which may not be well-suited to biomedical research domains. These challenges include a lack of dataset metadata, which affects their ability to select relevant datasets.

In this research, we first sought to deepen our understanding of how biomedical researchers search for datasets and the challenges they encounter through semi-structured interviews. Based on our first study’s findings, we focused on a specific challenge — the lack of provenance metadata — and its impact on the decision-making process. We then evaluated how provenance information enhances dataset search through a user study. Following this, we developed a provenance extraction tool to automatically extract provenance information from biomedical publications based on datasets and to estimate its scalability across all articles on exome sequencing experiments in PubMed. We conclude our research by evaluating the usefulness of the provenance extraction tool for dataset search through a user experience study.

The findings of this research provide a positive perspective on integrating provenance into biomedical dataset search. The results confirm the usefulness of provenance information in improving dataset search within the biomedical research domain, where the extracted information assists in enhancing decision-making and facilitates the selection of appropriate datasets.
University of Southampton
Almuntashiri, Abdullah
aa118cfa-3b60-4717-9855-2816bbbb28d0
Almuntashiri, Abdullah
aa118cfa-3b60-4717-9855-2816bbbb28d0
Chapman, Age
721b7321-8904-4be2-9b01-876c430743f1
Ibáñez, Luis-Daniel
65a2e20b-74a9-427d-8c4c-2330285153ed

Almuntashiri, Abdullah (2025) Improvement of biomedical dataset search through the integration of provenance. University of Southampton, Doctoral Thesis, 231pp.

Record type: Thesis (Doctoral)

Abstract

Efforts to support the application of Findable, Accessible, Interoperable, and Reusable (FAIR) principles in the biomedical research domain have led to an increase in the availability of datasets online, facilitating data exchange and reuse. This application significantly enhances research reproducibility and reduces the resources required to conduct research from scratch. As public biomedical repositories proliferate, an enormous number of datasets, encompassing various types of data, have become available to biomedical researchers. However, researchers require methods and tools that assist them in searching for and discovering relevant datasets. They still face challenges when using existing search engines, which may not be well-suited to biomedical research domains. These challenges include a lack of dataset metadata, which affects their ability to select relevant datasets.

In this research, we first sought to deepen our understanding of how biomedical researchers search for datasets and the challenges they encounter through semi-structured interviews. Based on our first study’s findings, we focused on a specific challenge — the lack of provenance metadata — and its impact on the decision-making process. We then evaluated how provenance information enhances dataset search through a user study. Following this, we developed a provenance extraction tool to automatically extract provenance information from biomedical publications based on datasets and to estimate its scalability across all articles on exome sequencing experiments in PubMed. We conclude our research by evaluating the usefulness of the provenance extraction tool for dataset search through a user experience study.

The findings of this research provide a positive perspective on integrating provenance into biomedical dataset search. The results confirm the usefulness of provenance information in improving dataset search within the biomedical research domain, where the extracted information assists in enhancing decision-making and facilitates the selection of appropriate datasets.

Text
Thesis_A-3a
Download (8MB)
Text
Final-thesis-submission-Examination-Mr-Abdullah-Almuntashiri
Restricted to Repository staff only

More information

Published date: 2025

Identifiers

Local EPrints ID: 504365
URI: http://eprints.soton.ac.uk/id/eprint/504365
PURE UUID: 96e037da-7444-4f05-8162-5290b1fbae3d
ORCID for Abdullah Almuntashiri: ORCID iD orcid.org/0000-0002-7343-6468
ORCID for Age Chapman: ORCID iD orcid.org/0000-0002-3814-2587
ORCID for Luis-Daniel Ibáñez: ORCID iD orcid.org/0000-0001-6993-0001

Catalogue record

Date deposited: 08 Sep 2025 16:48
Last modified: 11 Sep 2025 03:20

Export record

Contributors

Author: Abdullah Almuntashiri ORCID iD
Thesis advisor: Age Chapman ORCID iD
Thesis advisor: Luis-Daniel Ibáñez ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×