The University of Southampton
University of Southampton Institutional Repository

Leveraging Wikidata for biomedical entity linking in a low-resource setting: a case study for German

Leveraging Wikidata for biomedical entity linking in a low-resource setting: a case study for German
Leveraging Wikidata for biomedical entity linking in a low-resource setting: a case study for German
Biomedical Entity Linking (BEL) is a challenging task for low-resource languages, dueto the lack of appropriate resources: datasets,knowledge bases (KBs), and pre-trained models. In this paper, we propose an approach to create a biomedical knowledge base for German BEL using UMLS information from Wikidata, that provides good coverage and can be easily extended to further languages. As a further contribution, we adapt several existing approaches for use in the German BEL setup, and report on their results. The chosen methods include a sparse model using character n-grams,a multilingual biomedical entity linker, and two general-purpose text retrieval models. Our results show that a language-specific KB that provides good coverage leads to most improvement in entity linking performance, irrespective of the used model. The fine tuned German BEL model, newly created UMLS Wikidata KB as well as the code to reproduce our results are publicly available..
Mustafa, Faizan E
2956b3cf-bba9-465d-b164-6c90f95253e5
Dima, Corina
ce7f72c3-2a1a-4c62-b862-2cce9a650ffe
Diaz Ochoa, Juan G.
7826d0d1-2798-4906-979c-8be29d14a3fb
Staab, Steffen
bf48d51b-bd11-4d58-8e1c-4e6e03b30c49
Mustafa, Faizan E
2956b3cf-bba9-465d-b164-6c90f95253e5
Dima, Corina
ce7f72c3-2a1a-4c62-b862-2cce9a650ffe
Diaz Ochoa, Juan G.
7826d0d1-2798-4906-979c-8be29d14a3fb
Staab, Steffen
bf48d51b-bd11-4d58-8e1c-4e6e03b30c49

Mustafa, Faizan E, Dima, Corina, Diaz Ochoa, Juan G. and Staab, Steffen (2024) Leveraging Wikidata for biomedical entity linking in a low-resource setting: a case study for German. 6th Clinical Natural Language Processing Workshop. At NAACL 2024, , Mexico City, Mexico. 20 - 21 Jun 2024. 6 pp . (In Press)

Record type: Conference or Workshop Item (Paper)

Abstract

Biomedical Entity Linking (BEL) is a challenging task for low-resource languages, dueto the lack of appropriate resources: datasets,knowledge bases (KBs), and pre-trained models. In this paper, we propose an approach to create a biomedical knowledge base for German BEL using UMLS information from Wikidata, that provides good coverage and can be easily extended to further languages. As a further contribution, we adapt several existing approaches for use in the German BEL setup, and report on their results. The chosen methods include a sparse model using character n-grams,a multilingual biomedical entity linker, and two general-purpose text retrieval models. Our results show that a language-specific KB that provides good coverage leads to most improvement in entity linking performance, irrespective of the used model. The fine tuned German BEL model, newly created UMLS Wikidata KB as well as the code to reproduce our results are publicly available..

Text
camera-ready-29_leveraging_wikidata_for_biomed - Accepted Manuscript
Download (306kB)

More information

Accepted/In Press date: 24 April 2024
Venue - Dates: 6th Clinical Natural Language Processing Workshop. At NAACL 2024, , Mexico City, Mexico, 2024-06-20 - 2024-06-21

Identifiers

Local EPrints ID: 489537
URI: http://eprints.soton.ac.uk/id/eprint/489537
PURE UUID: 3fb9d8c8-d242-4c85-ba8c-d39c55e8583b
ORCID for Steffen Staab: ORCID iD orcid.org/0000-0002-0780-4154

Catalogue record

Date deposited: 26 Apr 2024 16:53
Last modified: 27 Apr 2024 01:53

Export record

Contributors

Author: Faizan E Mustafa
Author: Corina Dima
Author: Juan G. Diaz Ochoa
Author: Steffen Staab ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×