Characterising dataset search – an analysis of search logs and data requests
Characterising dataset search – an analysis of search logs and data requests
Large amounts of data are becoming increasingly available online. In order to benefit from it we need tools to retrieve the most relevant datasets that match ones data needs. Several vocabularies have been developed to describe datasets in order to increase their discoverability, but for data publishers is costly to cumbersome to annotate them using all, leading to the question of what properties are more important. In this work we contribute with a systematic study of the patterns and specific attributes that data consumers use to search for data and how it compares with general web search. We performed a query log analysis based on logs from four national open data portals and conducted a qualitative analysis of user data requests for requests issued to one of them. Search queries issued on data portals differ from those issued to web search engines in their length, topic, and structure. Based on our findings we hypothesise that portals search functionalities are currently used in an exploratory manner, rather than to retrieve a specific resource. In our study of data requests we found that geospatial and temporal attributes, as well as information on the required granularity of the data are the most common features. The findings of both analyses suggest that these features are of higher importance in dataset retrieval in contrast to general web search, suggesting that efforts of dataset publishers should focus on generating dataset descriptions including them.
DatasetSearch, VerticalSearch, SearchLogs
37-55
Kacprzak, Emilia, Magdalena
fdc38ad7-6879-4769-ad65-5d3582690af2
Koesten, Laura, Mylena
a3426c32-31d1-47a4-b500-f237b4e74084
Ibáñez, Luis-Daniel
65a2e20b-74a9-427d-8c4c-2330285153ed
Blount, Tom
7c4e5a1d-d105-4c18-8f02-42bd65e3f3a7
Tennison, Jeni
abfdd103-6089-427d-babb-56448595f2fa
Simperl, Elena
40261ae4-c58c-48e4-b78b-5187b10e4f67
March 2019
Kacprzak, Emilia, Magdalena
fdc38ad7-6879-4769-ad65-5d3582690af2
Koesten, Laura, Mylena
a3426c32-31d1-47a4-b500-f237b4e74084
Ibáñez, Luis-Daniel
65a2e20b-74a9-427d-8c4c-2330285153ed
Blount, Tom
7c4e5a1d-d105-4c18-8f02-42bd65e3f3a7
Tennison, Jeni
abfdd103-6089-427d-babb-56448595f2fa
Simperl, Elena
40261ae4-c58c-48e4-b78b-5187b10e4f67
Kacprzak, Emilia, Magdalena, Koesten, Laura, Mylena, Ibáñez, Luis-Daniel, Blount, Tom, Tennison, Jeni and Simperl, Elena
(2019)
Characterising dataset search – an analysis of search logs and data requests.
Journal of Web Semantics, 55, .
(doi:10.1016/j.websem.2018.11.003).
Abstract
Large amounts of data are becoming increasingly available online. In order to benefit from it we need tools to retrieve the most relevant datasets that match ones data needs. Several vocabularies have been developed to describe datasets in order to increase their discoverability, but for data publishers is costly to cumbersome to annotate them using all, leading to the question of what properties are more important. In this work we contribute with a systematic study of the patterns and specific attributes that data consumers use to search for data and how it compares with general web search. We performed a query log analysis based on logs from four national open data portals and conducted a qualitative analysis of user data requests for requests issued to one of them. Search queries issued on data portals differ from those issued to web search engines in their length, topic, and structure. Based on our findings we hypothesise that portals search functionalities are currently used in an exploratory manner, rather than to retrieve a specific resource. In our study of data requests we found that geospatial and temporal attributes, as well as information on the required granularity of the data are the most common features. The findings of both analyses suggest that these features are of higher importance in dataset retrieval in contrast to general web search, suggesting that efforts of dataset publishers should focus on generating dataset descriptions including them.
Text
kacprzak-dataset-search
- Accepted Manuscript
More information
Accepted/In Press date: 14 November 2018
e-pub ahead of print date: 19 November 2018
Published date: March 2019
Keywords:
DatasetSearch, VerticalSearch, SearchLogs
Identifiers
Local EPrints ID: 426734
URI: http://eprints.soton.ac.uk/id/eprint/426734
ISSN: 1570-8268
PURE UUID: ed484516-fae2-490b-8012-06369bf259fc
Catalogue record
Date deposited: 11 Dec 2018 17:30
Last modified: 16 Mar 2024 07:23
Export record
Altmetrics
Contributors
Author:
Emilia, Magdalena Kacprzak
Author:
Laura, Mylena Koesten
Author:
Luis-Daniel Ibáñez
Author:
Tom Blount
Author:
Jeni Tennison
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics