The University of Southampton
University of Southampton Institutional Repository

Characterising dataset search – an analysis of search logs and data requests

Characterising dataset search – an analysis of search logs and data requests
Characterising dataset search – an analysis of search logs and data requests
Large amounts of data are becoming increasingly available online. In order to benefit from it we need tools to retrieve the most relevant datasets that match ones data needs. Several vocabularies have been developed to describe datasets in order to increase their discoverability, but for data publishers is costly to cumbersome to annotate them using all, leading to the question of what properties are more important. In this work we contribute with a systematic study of the patterns and specific attributes that data consumers use to search for data and how it compares with general web search. We performed a query log analysis based on logs from four national open data portals and conducted a qualitative analysis of user data requests for requests issued to one of them. Search queries issued on data portals differ from those issued to web search engines in their length, topic, and structure. Based on our findings we hypothesise that portals search functionalities are currently used in an exploratory manner, rather than to retrieve a specific resource. In our study of data requests we found that geospatial and temporal attributes, as well as information on the required granularity of the data are the most common features. The findings of both analyses suggest that these features are of higher importance in dataset retrieval in contrast to general web search, suggesting that efforts of dataset publishers should focus on generating dataset descriptions including them.
DatasetSearch, VerticalSearch, SearchLogs
1570-8268
37-55
Kacprzak, Emilia, Magdalena
fdc38ad7-6879-4769-ad65-5d3582690af2
Koesten, Laura, Mylena
a3426c32-31d1-47a4-b500-f237b4e74084
Ibáñez, Luis-Daniel
65a2e20b-74a9-427d-8c4c-2330285153ed
Blount, Tom
7c4e5a1d-d105-4c18-8f02-42bd65e3f3a7
Tennison, Jeni
abfdd103-6089-427d-babb-56448595f2fa
Simperl, Elena
40261ae4-c58c-48e4-b78b-5187b10e4f67
Kacprzak, Emilia, Magdalena
fdc38ad7-6879-4769-ad65-5d3582690af2
Koesten, Laura, Mylena
a3426c32-31d1-47a4-b500-f237b4e74084
Ibáñez, Luis-Daniel
65a2e20b-74a9-427d-8c4c-2330285153ed
Blount, Tom
7c4e5a1d-d105-4c18-8f02-42bd65e3f3a7
Tennison, Jeni
abfdd103-6089-427d-babb-56448595f2fa
Simperl, Elena
40261ae4-c58c-48e4-b78b-5187b10e4f67

Kacprzak, Emilia, Magdalena, Koesten, Laura, Mylena, Ibáñez, Luis-Daniel, Blount, Tom, Tennison, Jeni and Simperl, Elena (2019) Characterising dataset search – an analysis of search logs and data requests. Journal of Web Semantics, 55, 37-55. (doi:10.1016/j.websem.2018.11.003).

Record type: Article

Abstract

Large amounts of data are becoming increasingly available online. In order to benefit from it we need tools to retrieve the most relevant datasets that match ones data needs. Several vocabularies have been developed to describe datasets in order to increase their discoverability, but for data publishers is costly to cumbersome to annotate them using all, leading to the question of what properties are more important. In this work we contribute with a systematic study of the patterns and specific attributes that data consumers use to search for data and how it compares with general web search. We performed a query log analysis based on logs from four national open data portals and conducted a qualitative analysis of user data requests for requests issued to one of them. Search queries issued on data portals differ from those issued to web search engines in their length, topic, and structure. Based on our findings we hypothesise that portals search functionalities are currently used in an exploratory manner, rather than to retrieve a specific resource. In our study of data requests we found that geospatial and temporal attributes, as well as information on the required granularity of the data are the most common features. The findings of both analyses suggest that these features are of higher importance in dataset retrieval in contrast to general web search, suggesting that efforts of dataset publishers should focus on generating dataset descriptions including them.

Text
kacprzak-dataset-search - Accepted Manuscript
Download (1MB)

More information

Accepted/In Press date: 14 November 2018
e-pub ahead of print date: 19 November 2018
Published date: March 2019
Keywords: DatasetSearch, VerticalSearch, SearchLogs

Identifiers

Local EPrints ID: 426734
URI: http://eprints.soton.ac.uk/id/eprint/426734
ISSN: 1570-8268
PURE UUID: ed484516-fae2-490b-8012-06369bf259fc
ORCID for Luis-Daniel Ibáñez: ORCID iD orcid.org/0000-0001-6993-0001
ORCID for Elena Simperl: ORCID iD orcid.org/0000-0003-1722-947X

Catalogue record

Date deposited: 11 Dec 2018 17:30
Last modified: 16 Mar 2024 07:23

Export record

Altmetrics

Contributors

Author: Emilia, Magdalena Kacprzak
Author: Laura, Mylena Koesten
Author: Luis-Daniel Ibáñez ORCID iD
Author: Tom Blount
Author: Jeni Tennison
Author: Elena Simperl ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×