Finding good enough: A task-based evaluation of query biased summarization for cross-language information retrieval
Finding good enough: A task-based evaluation of query biased summarization for cross-language information retrieval
In this paper we present our task-based evaluation of query biased summarization for cross-language information retrieval(CLIR) using relevance prediction. We describe our 13 summarization methods each from one of four summarization strategies. We show how well our methods perform using Farsi text from the CLEF2008 shared-task, which we translated to English automatically. We report precision/recall/F1, accuracy and time-on-task. We found that different summarization methods perform optimally for different evaluation metrics, but overall query biased word clouds are the best summarization strategy. In our analysis, we demonstrate that using the ROUGE metric on our sentence-based summaries cannot make the same kinds of distinctions as our evaluation framework does. Finally, we present our recommendations for creating much needed evaluation standards and datasets
657-669
Association for Computational Linguistics (ACL)
Williams, Jennifer
3a1568b4-8a0b-41d2-8635-14fe69fbb360
Tam, Sharon
b2706d88-5ed5-4650-8a5d-89746d520bd2
Shen, Wade
f57346e2-187e-4a27-b153-f77006128f32
29 October 2014
Williams, Jennifer
3a1568b4-8a0b-41d2-8635-14fe69fbb360
Tam, Sharon
b2706d88-5ed5-4650-8a5d-89746d520bd2
Shen, Wade
f57346e2-187e-4a27-b153-f77006128f32
Williams, Jennifer, Tam, Sharon and Shen, Wade
(2014)
Finding good enough: A task-based evaluation of query biased summarization for cross-language information retrieval.
In Empirical Methods in Natural Language Processing (EMNLP).
Association for Computational Linguistics (ACL).
.
(doi:10.3115/v1/D14-1).
Record type:
Conference or Workshop Item
(Paper)
Abstract
In this paper we present our task-based evaluation of query biased summarization for cross-language information retrieval(CLIR) using relevance prediction. We describe our 13 summarization methods each from one of four summarization strategies. We show how well our methods perform using Farsi text from the CLEF2008 shared-task, which we translated to English automatically. We report precision/recall/F1, accuracy and time-on-task. We found that different summarization methods perform optimally for different evaluation metrics, but overall query biased word clouds are the best summarization strategy. In our analysis, we demonstrate that using the ROUGE metric on our sentence-based summaries cannot make the same kinds of distinctions as our evaluation framework does. Finally, we present our recommendations for creating much needed evaluation standards and datasets
This record has no associated files available for download.
More information
Published date: 29 October 2014
Identifiers
Local EPrints ID: 470344
URI: http://eprints.soton.ac.uk/id/eprint/470344
PURE UUID: 0018220f-f141-436f-9a0e-6df087010cfa
Catalogue record
Date deposited: 06 Oct 2022 17:03
Last modified: 20 Jul 2024 02:07
Export record
Altmetrics
Contributors
Author:
Jennifer Williams
Author:
Sharon Tam
Author:
Wade Shen
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics