On fine-grained relevance scales
On fine-grained relevance scales
In Information Retrieval evaluation, the classical approach of adopting binary relevance judgments has been replaced by multi-level relevance judgments and by gain-based metrics leveraging such multi-level judgment scales. Recent work has also proposed and evaluated unbounded relevance scales by means of Magnitude Estimation (ME) and compared them with multi-level scales. While ME brings advantages like the ability for assessors to always judge the next document as having higher or lower relevance than any of the documents they have judged so far, it also comes with some drawbacks. For example, it is not a natural approach for human assessors to judge items as they are used to do on the Web (e.g., 5-star rating). In this work, we propose and experimentally evaluate a bounded and fine-grained relevance scale having many of the advantages and dealing with some of the issues of ME. We collect relevance judgments over a 100-level relevance scale (S100) by means of a large-scale crowdsourcing experiment and compare the results with other relevance scales (binary, 4-level, and ME) showing the benefit of fine-grained scales over both coarse-grained and unbounded scales as well as highlighting some new results on ME. Our results show that S100 maintains the flexibility of unbounded scales like ME in providing assessors with ample choice when judging document relevance (i.e., assessors can fit relevance judgments in between of previously given judgments). It also allows assessors to judge on a more familiar scale (e.g., on 10 levels) and to perform efficiently since the very first judging task.
IR Evaluation, Relevance Scales
675-684
Association for Computing Machinery
Roitero, Kevin
71dbbb60-a1e9-431e-930d-fbd498c6559f
Maddalena, Eddy
397dbaba-4363-4c11-8e52-4a7ba4df4bae
Demartini, Gianluca
2da91fe3-eac2-42d8-8450-b7d74b1d0209
Mizzaro, Stefano
7be30144-afe5-42bb-a861-59a451decc20
2018
Roitero, Kevin
71dbbb60-a1e9-431e-930d-fbd498c6559f
Maddalena, Eddy
397dbaba-4363-4c11-8e52-4a7ba4df4bae
Demartini, Gianluca
2da91fe3-eac2-42d8-8450-b7d74b1d0209
Mizzaro, Stefano
7be30144-afe5-42bb-a861-59a451decc20
Roitero, Kevin, Maddalena, Eddy, Demartini, Gianluca and Mizzaro, Stefano
(2018)
On fine-grained relevance scales.
In SIGIR 2018 - The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval.
Association for Computing Machinery.
.
(doi:10.1145/3209978.3210052).
Record type:
Conference or Workshop Item
(Paper)
Abstract
In Information Retrieval evaluation, the classical approach of adopting binary relevance judgments has been replaced by multi-level relevance judgments and by gain-based metrics leveraging such multi-level judgment scales. Recent work has also proposed and evaluated unbounded relevance scales by means of Magnitude Estimation (ME) and compared them with multi-level scales. While ME brings advantages like the ability for assessors to always judge the next document as having higher or lower relevance than any of the documents they have judged so far, it also comes with some drawbacks. For example, it is not a natural approach for human assessors to judge items as they are used to do on the Web (e.g., 5-star rating). In this work, we propose and experimentally evaluate a bounded and fine-grained relevance scale having many of the advantages and dealing with some of the issues of ME. We collect relevance judgments over a 100-level relevance scale (S100) by means of a large-scale crowdsourcing experiment and compare the results with other relevance scales (binary, 4-level, and ME) showing the benefit of fine-grained scales over both coarse-grained and unbounded scales as well as highlighting some new results on ME. Our results show that S100 maintains the flexibility of unbounded scales like ME in providing assessors with ample choice when judging document relevance (i.e., assessors can fit relevance judgments in between of previously given judgments). It also allows assessors to judge on a more familiar scale (e.g., on 10 levels) and to perform efficiently since the very first judging task.
This record has no associated files available for download.
More information
e-pub ahead of print date: 27 June 2018
Published date: 2018
Keywords:
IR Evaluation, Relevance Scales
Identifiers
Local EPrints ID: 423219
URI: http://eprints.soton.ac.uk/id/eprint/423219
PURE UUID: 7c63dc3e-40d0-4efb-a729-99fd678d5ca8
Catalogue record
Date deposited: 19 Sep 2018 16:30
Last modified: 15 Mar 2024 21:26
Export record
Altmetrics
Contributors
Author:
Kevin Roitero
Author:
Gianluca Demartini
Author:
Stefano Mizzaro
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics