Leveraging Wikidata for biomedical entity linking in a low-resource setting: a case study for German
Leveraging Wikidata for biomedical entity linking in a low-resource setting: a case study for German
Biomedical Entity Linking (BEL) is a challenging task for low-resource languages, dueto the lack of appropriate resources: datasets,knowledge bases (KBs), and pre-trained models. In this paper, we propose an approach to create a biomedical knowledge base for German BEL using UMLS information from Wikidata, that provides good coverage and can be easily extended to further languages. As a further contribution, we adapt several existing approaches for use in the German BEL setup, and report on their results. The chosen methods include a sparse model using character n-grams,a multilingual biomedical entity linker, and two general-purpose text retrieval models. Our results show that a language-specific KB that provides good coverage leads to most improvement in entity linking performance, irrespective of the used model. The fine tuned German BEL model, newly created UMLS Wikidata KB as well as the code to reproduce our results are publicly available..
Mustafa, Faizan E
2956b3cf-bba9-465d-b164-6c90f95253e5
Dima, Corina
ce7f72c3-2a1a-4c62-b862-2cce9a650ffe
Diaz Ochoa, Juan G.
7826d0d1-2798-4906-979c-8be29d14a3fb
Staab, Steffen
bf48d51b-bd11-4d58-8e1c-4e6e03b30c49
Mustafa, Faizan E
2956b3cf-bba9-465d-b164-6c90f95253e5
Dima, Corina
ce7f72c3-2a1a-4c62-b862-2cce9a650ffe
Diaz Ochoa, Juan G.
7826d0d1-2798-4906-979c-8be29d14a3fb
Staab, Steffen
bf48d51b-bd11-4d58-8e1c-4e6e03b30c49
Mustafa, Faizan E, Dima, Corina, Diaz Ochoa, Juan G. and Staab, Steffen
(2024)
Leveraging Wikidata for biomedical entity linking in a low-resource setting: a case study for German.
6th Clinical Natural Language Processing Workshop. At NAACL 2024, , Mexico City, Mexico.
20 - 21 Jun 2024.
6 pp
.
(In Press)
Record type:
Conference or Workshop Item
(Paper)
Abstract
Biomedical Entity Linking (BEL) is a challenging task for low-resource languages, dueto the lack of appropriate resources: datasets,knowledge bases (KBs), and pre-trained models. In this paper, we propose an approach to create a biomedical knowledge base for German BEL using UMLS information from Wikidata, that provides good coverage and can be easily extended to further languages. As a further contribution, we adapt several existing approaches for use in the German BEL setup, and report on their results. The chosen methods include a sparse model using character n-grams,a multilingual biomedical entity linker, and two general-purpose text retrieval models. Our results show that a language-specific KB that provides good coverage leads to most improvement in entity linking performance, irrespective of the used model. The fine tuned German BEL model, newly created UMLS Wikidata KB as well as the code to reproduce our results are publicly available..
Text
camera-ready-29_leveraging_wikidata_for_biomed
- Accepted Manuscript
More information
Accepted/In Press date: 24 April 2024
Venue - Dates:
6th Clinical Natural Language Processing Workshop. At NAACL 2024, , Mexico City, Mexico, 2024-06-20 - 2024-06-21
Identifiers
Local EPrints ID: 489537
URI: http://eprints.soton.ac.uk/id/eprint/489537
PURE UUID: 3fb9d8c8-d242-4c85-ba8c-d39c55e8583b
Catalogue record
Date deposited: 26 Apr 2024 16:53
Last modified: 27 Apr 2024 01:53
Export record
Contributors
Author:
Faizan E Mustafa
Author:
Corina Dima
Author:
Juan G. Diaz Ochoa
Author:
Steffen Staab
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics