The University of Southampton
University of Southampton Institutional Repository

FlexiTerm: a flexible term recognition method

FlexiTerm: a flexible term recognition method
FlexiTerm: a flexible term recognition method
Background: The increasing amount of textual information in biomedicine requires effective term recognition methods to identify textual representations of domain-specific concepts as the first step toward automating its semantic interpretation. The dictionary look-up approaches may not always be suitable for dynamic domains such as biomedicine or the newly emerging types of media such as patient blogs, the main obstacles being the use of non-standardised terminology and high degree of term variation. Results: In this paper, we describe FlexiTerm, a method for automatic term recognition from a domain-specific corpus, and evaluate its performance against five manually annotated corpora. FlexiTerm performs term recognition in two steps: linguistic filtering is used to select term candidates followed by calculation of termhood, a frequency-based measure used as evidence to qualify a candidate as a term. In order to improve the quality of termhood calculation, which may be affected by the term variation phenomena, FlexiTerm uses a range of methods to neutralise the main sources of variation in biomedical terms. It manages syntactic variation by processing candidates using a bag-of-words approach. Orthographic and morphological variations are dealt with using stemming in combination with lexical and phonetic similarity measures. The method was evaluated on five biomedical corpora. The highest values for precision (94.56, recall (71.31 and F-measure (81.31 were achieved on a corpus of clinical notes. Conclusions: FlexiTerm is an open-source software tool for automatic term recognition. It incorporates a simple term variant normalisation method. The method proved to be more robust than the baseline against less formally structured texts, such as those found in patient blogs or medical notes. The software can be downloaded freely at http://www.cs.cf.ac.uk/flexiterm.
Spasic, Irena
1d0c7300-22e0-44c1-bbb1-ccb634932661
Greenwood, Mark
6dcd2b83-1ae7-4c3e-85bc-d86b6cd4f2e8
Preece, Alun
7cf27e21-8fa9-4027-b7b8-0a651fbfd790
Francis, Nick
9b610883-605c-4fee-871d-defaa86ccf8e
Elwyn, Glyn
dd0ada9e-9b87-4734-9f9c-9a914d5e200a
Spasic, Irena
1d0c7300-22e0-44c1-bbb1-ccb634932661
Greenwood, Mark
6dcd2b83-1ae7-4c3e-85bc-d86b6cd4f2e8
Preece, Alun
7cf27e21-8fa9-4027-b7b8-0a651fbfd790
Francis, Nick
9b610883-605c-4fee-871d-defaa86ccf8e
Elwyn, Glyn
dd0ada9e-9b87-4734-9f9c-9a914d5e200a

Spasic, Irena, Greenwood, Mark, Preece, Alun, Francis, Nick and Elwyn, Glyn (2013) FlexiTerm: a flexible term recognition method. Journal of Biomedical Semantics, 4, [27]. (doi:10.1186/2041-1480-4-27).

Record type: Article

Abstract

Background: The increasing amount of textual information in biomedicine requires effective term recognition methods to identify textual representations of domain-specific concepts as the first step toward automating its semantic interpretation. The dictionary look-up approaches may not always be suitable for dynamic domains such as biomedicine or the newly emerging types of media such as patient blogs, the main obstacles being the use of non-standardised terminology and high degree of term variation. Results: In this paper, we describe FlexiTerm, a method for automatic term recognition from a domain-specific corpus, and evaluate its performance against five manually annotated corpora. FlexiTerm performs term recognition in two steps: linguistic filtering is used to select term candidates followed by calculation of termhood, a frequency-based measure used as evidence to qualify a candidate as a term. In order to improve the quality of termhood calculation, which may be affected by the term variation phenomena, FlexiTerm uses a range of methods to neutralise the main sources of variation in biomedical terms. It manages syntactic variation by processing candidates using a bag-of-words approach. Orthographic and morphological variations are dealt with using stemming in combination with lexical and phonetic similarity measures. The method was evaluated on five biomedical corpora. The highest values for precision (94.56, recall (71.31 and F-measure (81.31 were achieved on a corpus of clinical notes. Conclusions: FlexiTerm is an open-source software tool for automatic term recognition. It incorporates a simple term variant normalisation method. The method proved to be more robust than the baseline against less formally structured texts, such as those found in patient blogs or medical notes. The software can be downloaded freely at http://www.cs.cf.ac.uk/flexiterm.

Text
2041-1480-4-27 - Version of Record
Available under License Creative Commons Attribution.
Download (1MB)

More information

Accepted/In Press date: 3 October 2013
Published date: 10 October 2013

Identifiers

Local EPrints ID: 436303
URI: http://eprints.soton.ac.uk/id/eprint/436303
PURE UUID: 73dc4ee8-f7cf-4efe-a236-6e073ea15d32
ORCID for Nick Francis: ORCID iD orcid.org/0000-0001-8939-7312

Catalogue record

Date deposited: 06 Dec 2019 17:30
Last modified: 17 Mar 2024 03:58

Export record

Altmetrics

Contributors

Author: Irena Spasic
Author: Mark Greenwood
Author: Alun Preece
Author: Nick Francis ORCID iD
Author: Glyn Elwyn

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×