NILK: entity linking dataset targeting NIL-linking cases
NILK: entity linking dataset targeting NIL-linking cases
The NIL-linking task in Entity Linking deals with cases where the text mentions do not have a corresponding entity in the associated knowledge base. NIL-linking has two sub-tasks: NIL-detection and NIL-disambiguation. NIL-detection identifies NIL-mentions in the text. Then, NIL-disambiguation determines if some NIL-mentions refer to the same out-of-knowledge base entity. Although multiple existing datasets can be adapted for NIL-detection, none of them address the problem of NIL-disambiguation. This paper presents NILK, a new dataset for NIL-linking processing, constructed from WikiData and Wikipedia dumps from two different timestamps. The NILK dataset has two main features: 1) It marks NIL-mentions for NIL-detection by extracting mentions which belong to newly added entities in Wikipedia text. 2) It provides an entity label for NIL-disambiguation by marking NIL-mentions with WikiData IDs from the newer dump. We make available the annotated dataset along with the code1. The NILK dataset is available at: https://zenodo.org/record/66075142.
Iurshina, Anastasiia
953cc079-571a-41c4-84be-0c97943d4ef3
Boutalbi, Rafika
a03728b9-e89a-47ab-b2d2-d1cfd943e593
Pan, Jiaxin
b9d70726-a4ee-4bc3-b334-5af55068c7be
Staab, Steffen
bf48d51b-bd11-4d58-8e1c-4e6e03b30c49
Iurshina, Anastasiia
953cc079-571a-41c4-84be-0c97943d4ef3
Boutalbi, Rafika
a03728b9-e89a-47ab-b2d2-d1cfd943e593
Pan, Jiaxin
b9d70726-a4ee-4bc3-b334-5af55068c7be
Staab, Steffen
bf48d51b-bd11-4d58-8e1c-4e6e03b30c49
Iurshina, Anastasiia, Boutalbi, Rafika, Pan, Jiaxin and Staab, Steffen
(2022)
NILK: entity linking dataset targeting NIL-linking cases.
International Conference on Information and Knowledge Management, , Atlanta, United States.
17 - 21 Oct 2022.
5 pp
.
(In Press)
Record type:
Conference or Workshop Item
(Paper)
Abstract
The NIL-linking task in Entity Linking deals with cases where the text mentions do not have a corresponding entity in the associated knowledge base. NIL-linking has two sub-tasks: NIL-detection and NIL-disambiguation. NIL-detection identifies NIL-mentions in the text. Then, NIL-disambiguation determines if some NIL-mentions refer to the same out-of-knowledge base entity. Although multiple existing datasets can be adapted for NIL-detection, none of them address the problem of NIL-disambiguation. This paper presents NILK, a new dataset for NIL-linking processing, constructed from WikiData and Wikipedia dumps from two different timestamps. The NILK dataset has two main features: 1) It marks NIL-mentions for NIL-detection by extracting mentions which belong to newly added entities in Wikipedia text. 2) It provides an entity label for NIL-disambiguation by marking NIL-mentions with WikiData IDs from the newer dump. We make available the annotated dataset along with the code1. The NILK dataset is available at: https://zenodo.org/record/66075142.
Text
sp0935-iurshina
- Accepted Manuscript
Restricted to Repository staff only
Request a copy
More information
Accepted/In Press date: 4 August 2022
Venue - Dates:
International Conference on Information and Knowledge Management, , Atlanta, United States, 2022-10-17 - 2022-10-21
Identifiers
Local EPrints ID: 470716
URI: http://eprints.soton.ac.uk/id/eprint/470716
PURE UUID: a8b87ca0-d6cf-41ab-a27e-5d4fded184d9
Catalogue record
Date deposited: 18 Oct 2022 17:06
Last modified: 17 Mar 2024 03:38
Export record
Contributors
Author:
Anastasiia Iurshina
Author:
Rafika Boutalbi
Author:
Jiaxin Pan
Author:
Steffen Staab
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics