The University of Southampton
University of Southampton Institutional Repository

A Taxonomy of Dataset Search

A Taxonomy of Dataset Search
A Taxonomy of Dataset Search
The demand for and use of data have increased in all life science do-mains, particularly in scientific communities. Data is organised into datasets which are used in many tasks, e.g. training machine learning (ML) models. Those datasets are stored either privately or publicly in repositories or data portals that can be published on the Web. Due to the need to find and reuse datasets, a new research field has appeared that focuses on the process of searching datasets to meet users’ needs. Therefore, the purpose of this paper is to explore the dataset search literature in order to identify the used methods, algorithms, systems and benchmarks and then classify them. We performed a complete search of the dataset search literature on various search engines, scientific sites and digital libraries. We discovered more than 100 dataset search articles, and then we narrowed those articles to 31 after applying the exclusion criteria. As a result, a new dataset search taxonomy has been designed based on the search style that is used by users to retrieve datasets.
Dataset Search, Dataset Retrieval, Dataset Discovery
2367-4512
Springer
Almuntashiri, Abdullah Hamed
aa118cfa-3b60-4717-9855-2816bbbb28d0
Ibáñez, Luis-Daniel
65a2e20b-74a9-427d-8c4c-2330285153ed
Chapman, Adriane
721b7321-8904-4be2-9b01-876c430743f1
Almuntashiri, Abdullah Hamed
aa118cfa-3b60-4717-9855-2816bbbb28d0
Ibáñez, Luis-Daniel
65a2e20b-74a9-427d-8c4c-2330285153ed
Chapman, Adriane
721b7321-8904-4be2-9b01-876c430743f1

Almuntashiri, Abdullah Hamed, Ibáñez, Luis-Daniel and Chapman, Adriane (2022) A Taxonomy of Dataset Search. In Lecture Notes on Data Engineering and Communications Technologies. Springer. 12 pp .

Record type: Conference or Workshop Item (Paper)

Abstract

The demand for and use of data have increased in all life science do-mains, particularly in scientific communities. Data is organised into datasets which are used in many tasks, e.g. training machine learning (ML) models. Those datasets are stored either privately or publicly in repositories or data portals that can be published on the Web. Due to the need to find and reuse datasets, a new research field has appeared that focuses on the process of searching datasets to meet users’ needs. Therefore, the purpose of this paper is to explore the dataset search literature in order to identify the used methods, algorithms, systems and benchmarks and then classify them. We performed a complete search of the dataset search literature on various search engines, scientific sites and digital libraries. We discovered more than 100 dataset search articles, and then we narrowed those articles to 31 after applying the exclusion criteria. As a result, a new dataset search taxonomy has been designed based on the search style that is used by users to retrieve datasets.

Text
A Taxonomy of Dataset Search
Restricted to Repository staff only
Request a copy

More information

Accepted/In Press date: 15 October 2022
Published date: 16 October 2022
Venue - Dates: The 3rd International Conference of Advanced Computing and Informatics (ICACIN 2022), Hassan II University Library, Casablanca, Morocco, 2022-10-15 - 2022-10-16
Keywords: Dataset Search, Dataset Retrieval, Dataset Discovery

Identifiers

Local EPrints ID: 471518
URI: http://eprints.soton.ac.uk/id/eprint/471518
ISSN: 2367-4512
PURE UUID: 743cc13d-a2c6-4a22-b1f6-f5c3b7e61176
ORCID for Abdullah Hamed Almuntashiri: ORCID iD orcid.org/0000-0002-7343-6468
ORCID for Luis-Daniel Ibáñez: ORCID iD orcid.org/0000-0001-6993-0001
ORCID for Adriane Chapman: ORCID iD orcid.org/0000-0002-3814-2587

Catalogue record

Date deposited: 10 Nov 2022 17:30
Last modified: 01 Aug 2024 02:01

Export record

Contributors

Author: Abdullah Hamed Almuntashiri ORCID iD
Author: Luis-Daniel Ibáñez ORCID iD
Author: Adriane Chapman ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×