Considering assessor agreement in IR evaluation

The agreement between relevance assessors is an important but understudied topic in the Information Retrieval literature because of the limited data available about documents assessed by multiple judges. This issue has gained even more importance recently in light of crowdsourced relevance judgments, where it is customary to gather many relevance labels for each topic-document pair. In a crowdsourcing setting, agreement is often even used as a proxy for quality, although without any systematic verification of the conjecture that higher agreement corresponds to higher quality. In this paper we address this issue and we study in particular: the effect of topic on assessor agreement; the relationship between assessor agreement and judgment quality; the effect of agreement on ranking systems according to their effectiveness; and the definition of an agreement-aware effectiveness metric that does not discard information about multiple judgments for the same document as it typically happens in a crowdsourcing setting.

Agreement, Disagreement, Evaluation, Test collections, TREC

10.1145/3121050.3121060

75-82

Association for Computing Machinery

Maddalena, Eddy

397dbaba-4363-4c11-8e52-4a7ba4df4bae

Roitero, Kevin

71dbbb60-a1e9-431e-930d-fbd498c6559f

Demartini, Gianluca

2da91fe3-eac2-42d8-8450-b7d74b1d0209

Mizzaro, Stefano

7be30144-afe5-42bb-a861-59a451decc20

1 October 2017

Maddalena, Eddy

397dbaba-4363-4c11-8e52-4a7ba4df4bae

Roitero, Kevin

71dbbb60-a1e9-431e-930d-fbd498c6559f

Demartini, Gianluca

2da91fe3-eac2-42d8-8450-b7d74b1d0209

Mizzaro, Stefano

7be30144-afe5-42bb-a861-59a451decc20

Maddalena, Eddy, Roitero, Kevin, Demartini, Gianluca and Mizzaro, Stefano (2017) Considering assessor agreement in IR evaluation. In ICTIR 2017: Proceedings of the 2017 ACM SIGIR International Conference on the Theory of Information Retrieval. Association for Computing Machinery. pp. 75-82 . (doi:10.1145/3121050.3121060).

Record type: Conference or Workshop Item (Paper)

Abstract

This record has no associated files available for download.

More information

Published date: 1 October 2017

Venue - Dates: 7th ACM SIGIR International Conference on the Theory of Information Retrieval, ICTIR 2017, , Amsterdam, Netherlands, 2017-10-01 - 2017-10-04

Keywords: Agreement, Disagreement, Evaluation, Test collections, TREC