The University of Southampton
University of Southampton Institutional Repository

Dataset search: a survey

Dataset search: a survey
Dataset search: a survey
Generating value from data requires the ability to find, access and make sense of datasets. There are many efforts underway to encourage data sharing and reuse, from scientific publishers asking authors to submit data alongside manuscripts to data marketplaces, open data portals and data communities. Google recently beta-released a search service for datasets, which allows users to discover data stored in various online repositories via keyword queries. These developments foreshadow an emerging research field around dataset search or retrieval that broadly encompasses frameworks, methods and tools that help match a user data need against a collection of datasets. Here, we survey the state of the art of research and commercial systems and discuss what makes dataset search a field in its own right, with unique challenges and open questions. We look at approaches and implementations from related areas dataset search is drawing upon, including information retrieval, databases, entity-centric and tabular search in order to identify possible paths to tackle these questions as well as immediate next steps that will take the field forward.
Dataset search · Dataset retrieval · Dataset · Information search and retrieval
1066-8888
1-22
Chapman, Adriane
721b7321-8904-4be2-9b01-876c430743f1
Simperl, Elena
40261ae4-c58c-48e4-b78b-5187b10e4f67
Koesten, Laura
79e66d1b-2d8f-43df-a39b-60bc7749fb22
Konstantinidis, Georgios
f174fb99-8434-4485-a7e4-bee0fef39b42
Ibanez Gonzalez, Luis
65a2e20b-74a9-427d-8c4c-2330285153ed
Kacprzak, Emilia
fdc38ad7-6879-4769-ad65-5d3582690af2
Groth, Paul
427b9eca-c4dd-45c1-be04-3c91bb327345
Chapman, Adriane
721b7321-8904-4be2-9b01-876c430743f1
Simperl, Elena
40261ae4-c58c-48e4-b78b-5187b10e4f67
Koesten, Laura
79e66d1b-2d8f-43df-a39b-60bc7749fb22
Konstantinidis, Georgios
f174fb99-8434-4485-a7e4-bee0fef39b42
Ibanez Gonzalez, Luis
65a2e20b-74a9-427d-8c4c-2330285153ed
Kacprzak, Emilia
fdc38ad7-6879-4769-ad65-5d3582690af2
Groth, Paul
427b9eca-c4dd-45c1-be04-3c91bb327345

Chapman, Adriane, Simperl, Elena, Koesten, Laura, Konstantinidis, Georgios, Ibanez Gonzalez, Luis, Kacprzak, Emilia and Groth, Paul (2019) Dataset search: a survey. The VLDB Journal, 1-22. (doi:10.1007/s00778-019-00564-x).

Record type: Article

Abstract

Generating value from data requires the ability to find, access and make sense of datasets. There are many efforts underway to encourage data sharing and reuse, from scientific publishers asking authors to submit data alongside manuscripts to data marketplaces, open data portals and data communities. Google recently beta-released a search service for datasets, which allows users to discover data stored in various online repositories via keyword queries. These developments foreshadow an emerging research field around dataset search or retrieval that broadly encompasses frameworks, methods and tools that help match a user data need against a collection of datasets. Here, we survey the state of the art of research and commercial systems and discuss what makes dataset search a field in its own right, with unique challenges and open questions. We look at approaches and implementations from related areas dataset search is drawing upon, including information retrieval, databases, entity-centric and tabular search in order to identify possible paths to tackle these questions as well as immediate next steps that will take the field forward.

Text
ChapmanDatasetSearchFinal - Version of Record
Available under License Creative Commons Attribution.
Download (1MB)

More information

Accepted/In Press date: 12 August 2019
e-pub ahead of print date: 24 August 2019
Keywords: Dataset search · Dataset retrieval · Dataset · Information search and retrieval

Identifiers

Local EPrints ID: 433957
URI: https://eprints.soton.ac.uk/id/eprint/433957
ISSN: 1066-8888
PURE UUID: a367d245-023f-4a4d-82c9-6821ac3ef2a4
ORCID for Adriane Chapman: ORCID iD orcid.org/0000-0002-3814-2587
ORCID for Elena Simperl: ORCID iD orcid.org/0000-0003-1722-947X

Catalogue record

Date deposited: 09 Sep 2019 16:30
Last modified: 12 Nov 2019 01:38

Export record

Altmetrics

Contributors

Author: Adriane Chapman ORCID iD
Author: Elena Simperl ORCID iD
Author: Laura Koesten
Author: Georgios Konstantinidis
Author: Luis Ibanez Gonzalez
Author: Emilia Kacprzak
Author: Paul Groth

University divisions

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of https://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×