Improvement of biomedical dataset search through the integration of provenance
Improvement of biomedical dataset search through the integration of provenance
Efforts to support the application of Findable, Accessible, Interoperable, and Reusable (FAIR) principles in the biomedical research domain have led to an increase in the availability of datasets online, facilitating data exchange and reuse. This application significantly enhances research reproducibility and reduces the resources required to conduct research from scratch. As public biomedical repositories proliferate, an enormous number of datasets, encompassing various types of data, have become available to biomedical researchers. However, researchers require methods and tools that assist them in searching for and discovering relevant datasets. They still face challenges when using existing search engines, which may not be well-suited to biomedical research domains. These challenges include a lack of dataset metadata, which affects their ability to select relevant datasets.
In this research, we first sought to deepen our understanding of how biomedical researchers search for datasets and the challenges they encounter through semi-structured interviews. Based on our first study’s findings, we focused on a specific challenge — the lack of provenance metadata — and its impact on the decision-making process. We then evaluated how provenance information enhances dataset search through a user study. Following this, we developed a provenance extraction tool to automatically extract provenance information from biomedical publications based on datasets and to estimate its scalability across all articles on exome sequencing experiments in PubMed. We conclude our research by evaluating the usefulness of the provenance extraction tool for dataset search through a user experience study.
The findings of this research provide a positive perspective on integrating provenance into biomedical dataset search. The results confirm the usefulness of provenance information in improving dataset search within the biomedical research domain, where the extracted information assists in enhancing decision-making and facilitates the selection of appropriate datasets.
University of Southampton
Almuntashiri, Abdullah
aa118cfa-3b60-4717-9855-2816bbbb28d0
2025
Almuntashiri, Abdullah
aa118cfa-3b60-4717-9855-2816bbbb28d0
Chapman, Age
721b7321-8904-4be2-9b01-876c430743f1
Ibáñez, Luis-Daniel
65a2e20b-74a9-427d-8c4c-2330285153ed
Almuntashiri, Abdullah
(2025)
Improvement of biomedical dataset search through the integration of provenance.
University of Southampton, Doctoral Thesis, 231pp.
Record type:
Thesis
(Doctoral)
Abstract
Efforts to support the application of Findable, Accessible, Interoperable, and Reusable (FAIR) principles in the biomedical research domain have led to an increase in the availability of datasets online, facilitating data exchange and reuse. This application significantly enhances research reproducibility and reduces the resources required to conduct research from scratch. As public biomedical repositories proliferate, an enormous number of datasets, encompassing various types of data, have become available to biomedical researchers. However, researchers require methods and tools that assist them in searching for and discovering relevant datasets. They still face challenges when using existing search engines, which may not be well-suited to biomedical research domains. These challenges include a lack of dataset metadata, which affects their ability to select relevant datasets.
In this research, we first sought to deepen our understanding of how biomedical researchers search for datasets and the challenges they encounter through semi-structured interviews. Based on our first study’s findings, we focused on a specific challenge — the lack of provenance metadata — and its impact on the decision-making process. We then evaluated how provenance information enhances dataset search through a user study. Following this, we developed a provenance extraction tool to automatically extract provenance information from biomedical publications based on datasets and to estimate its scalability across all articles on exome sequencing experiments in PubMed. We conclude our research by evaluating the usefulness of the provenance extraction tool for dataset search through a user experience study.
The findings of this research provide a positive perspective on integrating provenance into biomedical dataset search. The results confirm the usefulness of provenance information in improving dataset search within the biomedical research domain, where the extracted information assists in enhancing decision-making and facilitates the selection of appropriate datasets.
Text
Final-thesis-submission-Examination-Mr-Abdullah-Almuntashiri
Restricted to Repository staff only
More information
Published date: 2025
Identifiers
Local EPrints ID: 504365
URI: http://eprints.soton.ac.uk/id/eprint/504365
PURE UUID: 96e037da-7444-4f05-8162-5290b1fbae3d
Catalogue record
Date deposited: 08 Sep 2025 16:48
Last modified: 11 Sep 2025 03:20
Export record
Contributors
Author:
Abdullah Almuntashiri
Thesis advisor:
Luis-Daniel Ibáñez
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics