The University of Southampton
University of Southampton Institutional Repository

On fine-grained relevance scales

On fine-grained relevance scales
On fine-grained relevance scales
In Information Retrieval evaluation, the classical approach of adopting binary relevance judgments has been replaced by multi-level relevance judgments and by gain-based metrics leveraging such multi-level judgment scales. Recent work has also proposed and evaluated unbounded relevance scales by means of Magnitude Estimation (ME) and compared them with multi-level scales. While ME brings advantages like the ability for assessors to always judge the next document as having higher or lower relevance than any of the documents they have judged so far, it also comes with some drawbacks. For example, it is not a natural approach for human assessors to judge items as they are used to do on the Web (e.g., 5-star rating). In this work, we propose and experimentally evaluate a bounded and fine-grained relevance scale having many of the advantages and dealing with some of the issues of ME. We collect relevance judgments over a 100-level relevance scale (S100) by means of a large-scale crowdsourcing experiment and compare the results with other relevance scales (binary, 4-level, and ME) showing the benefit of fine-grained scales over both coarse-grained and unbounded scales as well as highlighting some new results on ME. Our results show that S100 maintains the flexibility of unbounded scales like ME in providing assessors with ample choice when judging document relevance (i.e., assessors can fit relevance judgments in between of previously given judgments). It also allows assessors to judge on a more familiar scale (e.g., on 10 levels) and to perform efficiently since the very first judging task.
IR Evaluation, Relevance Scales
675-684
Association for Computing Machinery
Roitero, Kevin
71dbbb60-a1e9-431e-930d-fbd498c6559f
Maddalena, Eddy
397dbaba-4363-4c11-8e52-4a7ba4df4bae
Demartini, Gianluca
2da91fe3-eac2-42d8-8450-b7d74b1d0209
Mizzaro, Stefano
7be30144-afe5-42bb-a861-59a451decc20
Roitero, Kevin
71dbbb60-a1e9-431e-930d-fbd498c6559f
Maddalena, Eddy
397dbaba-4363-4c11-8e52-4a7ba4df4bae
Demartini, Gianluca
2da91fe3-eac2-42d8-8450-b7d74b1d0209
Mizzaro, Stefano
7be30144-afe5-42bb-a861-59a451decc20

Roitero, Kevin, Maddalena, Eddy, Demartini, Gianluca and Mizzaro, Stefano (2018) On fine-grained relevance scales. In SIGIR 2018 - The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. Association for Computing Machinery. pp. 675-684 . (doi:10.1145/3209978.3210052).

Record type: Conference or Workshop Item (Paper)

Abstract

In Information Retrieval evaluation, the classical approach of adopting binary relevance judgments has been replaced by multi-level relevance judgments and by gain-based metrics leveraging such multi-level judgment scales. Recent work has also proposed and evaluated unbounded relevance scales by means of Magnitude Estimation (ME) and compared them with multi-level scales. While ME brings advantages like the ability for assessors to always judge the next document as having higher or lower relevance than any of the documents they have judged so far, it also comes with some drawbacks. For example, it is not a natural approach for human assessors to judge items as they are used to do on the Web (e.g., 5-star rating). In this work, we propose and experimentally evaluate a bounded and fine-grained relevance scale having many of the advantages and dealing with some of the issues of ME. We collect relevance judgments over a 100-level relevance scale (S100) by means of a large-scale crowdsourcing experiment and compare the results with other relevance scales (binary, 4-level, and ME) showing the benefit of fine-grained scales over both coarse-grained and unbounded scales as well as highlighting some new results on ME. Our results show that S100 maintains the flexibility of unbounded scales like ME in providing assessors with ample choice when judging document relevance (i.e., assessors can fit relevance judgments in between of previously given judgments). It also allows assessors to judge on a more familiar scale (e.g., on 10 levels) and to perform efficiently since the very first judging task.

This record has no associated files available for download.

More information

e-pub ahead of print date: 27 June 2018
Published date: 2018
Keywords: IR Evaluation, Relevance Scales

Identifiers

Local EPrints ID: 423219
URI: http://eprints.soton.ac.uk/id/eprint/423219
PURE UUID: 7c63dc3e-40d0-4efb-a729-99fd678d5ca8

Catalogue record

Date deposited: 19 Sep 2018 16:30
Last modified: 15 Mar 2024 21:26

Export record

Altmetrics

Contributors

Author: Kevin Roitero
Author: Eddy Maddalena
Author: Gianluca Demartini
Author: Stefano Mizzaro

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×