The University of Southampton
University of Southampton Institutional Repository

Everything you always wanted to know about a dataset: studies in data summarisation

Everything you always wanted to know about a dataset: studies in data summarisation
Everything you always wanted to know about a dataset: studies in data summarisation

Summarising data as text helps people make sense of it. It also improves data discovery, as search algorithms can match this text against keyword queries. In this paper, we explore the characteristics of text summaries of data in order to understand how meaningful summaries look like. We present two complementary studies: a data-search diary study with 69 students, which offers insight into the information needs of people searching for data; and a summarisation study, with a lab and a crowdsourcing component with overall 80 data-literate participants, who produced summaries for 25 datasets. In each study we carried out a qualitative analysis to identify key themes and commonly mentioned dataset attributes, which people consider when searching and making sense of data. The results helped us design a template to create more meaningful textual representations of data, alongside guidelines for improving data-search experience overall.

Data search, Data sensemaking, Data summarisation, Human data interaction
1071-5819
Koesten, Laura
79e66d1b-2d8f-43df-a39b-60bc7749fb22
Simperl, Elena
40261ae4-c58c-48e4-b78b-5187b10e4f67
Kacprzak, Emilia, Magdalena
fdc38ad7-6879-4769-ad65-5d3582690af2
Blount, Thomas
4d4db315-08d9-4701-9604-1e99c60879fb
Tennison, Jeni
abfdd103-6089-427d-babb-56448595f2fa
Koesten, Laura
79e66d1b-2d8f-43df-a39b-60bc7749fb22
Simperl, Elena
40261ae4-c58c-48e4-b78b-5187b10e4f67
Kacprzak, Emilia, Magdalena
fdc38ad7-6879-4769-ad65-5d3582690af2
Blount, Thomas
4d4db315-08d9-4701-9604-1e99c60879fb
Tennison, Jeni
abfdd103-6089-427d-babb-56448595f2fa

Koesten, Laura, Simperl, Elena, Kacprzak, Emilia, Magdalena, Blount, Thomas and Tennison, Jeni (2020) Everything you always wanted to know about a dataset: studies in data summarisation. International Journal of Human-Computer Studies, 135, [102367]. (doi:10.1016/j.ijhcs.2019.10.004).

Record type: Article

Abstract

Summarising data as text helps people make sense of it. It also improves data discovery, as search algorithms can match this text against keyword queries. In this paper, we explore the characteristics of text summaries of data in order to understand how meaningful summaries look like. We present two complementary studies: a data-search diary study with 69 students, which offers insight into the information needs of people searching for data; and a summarisation study, with a lab and a crowdsourcing component with overall 80 data-literate participants, who produced summaries for 25 datasets. In each study we carried out a qualitative analysis to identify key themes and commonly mentioned dataset attributes, which people consider when searching and making sense of data. The results helped us design a template to create more meaningful textual representations of data, alongside guidelines for improving data-search experience overall.

Text
Everything you always wanted to know about a dataset_studies in data summarisation - Accepted Manuscript
Download (3MB)

More information

Accepted/In Press date: 14 October 2019
e-pub ahead of print date: 14 October 2019
Published date: March 2020
Additional Information: Funding Information: This project was supported by the European Union Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 642795 ,and No 780247 and by the EPSRC Data Stories project EP/P025676/1 . We thank our participants for taking part in this study. Publisher Copyright: © 2019 Elsevier Ltd
Keywords: Data search, Data sensemaking, Data summarisation, Human data interaction

Identifiers

Local EPrints ID: 436246
URI: http://eprints.soton.ac.uk/id/eprint/436246
ISSN: 1071-5819
PURE UUID: 23d95d48-3905-4f61-a79b-0248b5a98718
ORCID for Elena Simperl: ORCID iD orcid.org/0000-0003-1722-947X
ORCID for Thomas Blount: ORCID iD orcid.org/0000-0002-4879-5012

Catalogue record

Date deposited: 04 Dec 2019 17:30
Last modified: 17 Mar 2024 05:06

Export record

Altmetrics

Contributors

Author: Laura Koesten
Author: Elena Simperl ORCID iD
Author: Emilia, Magdalena Kacprzak
Author: Thomas Blount ORCID iD
Author: Jeni Tennison

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×