A Taxonomy of Dataset Search
A Taxonomy of Dataset Search
The demand for and use of data have increased in all life science do-mains, particularly in scientific communities. Data is organised into datasets which are used in many tasks, e.g. training machine learning (ML) models. Those datasets are stored either privately or publicly in repositories or data portals that can be published on the Web. Due to the need to find and reuse datasets, a new research field has appeared that focuses on the process of searching datasets to meet users’ needs. Therefore, the purpose of this paper is to explore the dataset search literature in order to identify the used methods, algorithms, systems and benchmarks and then classify them. We performed a complete search of the dataset search literature on various search engines, scientific sites and digital libraries. We discovered more than 100 dataset search articles, and then we narrowed those articles to 31 after applying the exclusion criteria. As a result, a new dataset search taxonomy has been designed based on the search style that is used by users to retrieve datasets.
Dataset Search, Dataset Retrieval, Dataset Discovery
Almuntashiri, Abdullah Hamed
aa118cfa-3b60-4717-9855-2816bbbb28d0
Ibáñez, Luis-Daniel
65a2e20b-74a9-427d-8c4c-2330285153ed
Chapman, Adriane
721b7321-8904-4be2-9b01-876c430743f1
16 October 2022
Almuntashiri, Abdullah Hamed
aa118cfa-3b60-4717-9855-2816bbbb28d0
Ibáñez, Luis-Daniel
65a2e20b-74a9-427d-8c4c-2330285153ed
Chapman, Adriane
721b7321-8904-4be2-9b01-876c430743f1
Almuntashiri, Abdullah Hamed, Ibáñez, Luis-Daniel and Chapman, Adriane
(2022)
A Taxonomy of Dataset Search.
In Lecture Notes on Data Engineering and Communications Technologies.
Springer.
12 pp
.
Record type:
Conference or Workshop Item
(Paper)
Abstract
The demand for and use of data have increased in all life science do-mains, particularly in scientific communities. Data is organised into datasets which are used in many tasks, e.g. training machine learning (ML) models. Those datasets are stored either privately or publicly in repositories or data portals that can be published on the Web. Due to the need to find and reuse datasets, a new research field has appeared that focuses on the process of searching datasets to meet users’ needs. Therefore, the purpose of this paper is to explore the dataset search literature in order to identify the used methods, algorithms, systems and benchmarks and then classify them. We performed a complete search of the dataset search literature on various search engines, scientific sites and digital libraries. We discovered more than 100 dataset search articles, and then we narrowed those articles to 31 after applying the exclusion criteria. As a result, a new dataset search taxonomy has been designed based on the search style that is used by users to retrieve datasets.
Text
A Taxonomy of Dataset Search
Restricted to Repository staff only
Request a copy
More information
Accepted/In Press date: 15 October 2022
Published date: 16 October 2022
Venue - Dates:
The 3rd International Conference of Advanced Computing and Informatics (ICACIN 2022), Hassan II University Library, Casablanca, Morocco, 2022-10-15 - 2022-10-16
Keywords:
Dataset Search, Dataset Retrieval, Dataset Discovery
Identifiers
Local EPrints ID: 471518
URI: http://eprints.soton.ac.uk/id/eprint/471518
ISSN: 2367-4512
PURE UUID: 743cc13d-a2c6-4a22-b1f6-f5c3b7e61176
Catalogue record
Date deposited: 10 Nov 2022 17:30
Last modified: 01 Aug 2024 02:01
Export record
Contributors
Author:
Abdullah Hamed Almuntashiri
Author:
Luis-Daniel Ibáñez
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics