P319 natural language processing and named entity recognition in inflammatory bowel disease referrals
P319 natural language processing and named entity recognition in inflammatory bowel disease referrals
Introduction: clinical natural language processing (NLP) techniques are evolving, such that over the next few years they will start to support clinicians to interpret clinical information. Named entity recognition and linkage (NER+L) to standard ontologies with millions of concepts, such as The Unified Medical Language System (UMLS) add value to otherwise unstructured textual data. However, little research has been done in the field of Inflammatory Bowel Disease (IBD).
Methods: anonymised GP referral letters triaged between 1st January 2017 to 31st March 2021 using an agreed protocol by a panel of Gastroenterologists as likely new or recurrent IBD were randomly extracted. NLP in python was applied to referral free text using MedCAT, a model trained on the UMLS database.
Manual validation was performed to determine sensitivity vs ground truth for finding positive mentions of four cardinal clinical signs and symptoms. Sensitivity = TP/(TP + FN) was the outcome of greatest interest. Chi2 was used for statistical comparison at the p<0.05 level.
Results: 125 referral letters were included in this study. Median age: 39(IQR:[30-50]), 51.2% Male[95%CI:42.4-60.1]. 22.4%(n=28) of the cohort had pre-existing IBD. Table 1 summarises the performance of the algorithm against the correct human validations: [P319 Table 1 NLP model outcome parameters not included]. Diarrhoea and abdominal pain were both most mentioned and most successfully detected by MedCAT, however, significant differences were flagged in all cases.
Conclusions: significant differences were observed between human validations and model predictions for four common IBD signs and symptoms, suggesting that these models are not yet mature enough for use in clinical practice. Annotations for more difficult concepts, such as rectal bleeding and weight loss need to be improved in major open-source NLP corpora.
A194-A195
Stammers, Matt
a4ad3bd5-7323-4a6d-9c00-2c34f8ae5bd3
George, Michael
73c78842-9866-4912-afb4-60a6d0c64822
Regas, Constantinos
eef3cec5-ec27-44f2-91d5-19bdc30285dd
May, Georgia
217900c9-cce6-4e3c-9651-b31475975400
Rahmany, Sohail
9345a4c5-0294-4edf-b8c9-8b88b1627fb7
Davis, Cai
a4aa4473-7285-41d5-8807-11717fdddb05
Borca, Florina
31fc3965-6bcf-4fd6-85bc-8b0f99f62473
Knibbs, Will
275c6a4c-138f-4e03-877f-e387de658a5f
Chandrabalan, Vishnu V.
b29c126b-e258-4584-9f69-2da5a2f7a583
Gwiggner, Markus
af72b597-1ead-4155-a25c-0835f7e560c2
Stammers, Matt
a4ad3bd5-7323-4a6d-9c00-2c34f8ae5bd3
George, Michael
73c78842-9866-4912-afb4-60a6d0c64822
Regas, Constantinos
eef3cec5-ec27-44f2-91d5-19bdc30285dd
May, Georgia
217900c9-cce6-4e3c-9651-b31475975400
Rahmany, Sohail
9345a4c5-0294-4edf-b8c9-8b88b1627fb7
Davis, Cai
a4aa4473-7285-41d5-8807-11717fdddb05
Borca, Florina
31fc3965-6bcf-4fd6-85bc-8b0f99f62473
Knibbs, Will
275c6a4c-138f-4e03-877f-e387de658a5f
Chandrabalan, Vishnu V.
b29c126b-e258-4584-9f69-2da5a2f7a583
Gwiggner, Markus
af72b597-1ead-4155-a25c-0835f7e560c2
Stammers, Matt, George, Michael, Regas, Constantinos, May, Georgia, Rahmany, Sohail, Davis, Cai, Borca, Florina, Knibbs, Will, Chandrabalan, Vishnu V. and Gwiggner, Markus
(2022)
P319 natural language processing and named entity recognition in inflammatory bowel disease referrals.
Gut, 71, .
(doi:10.1136/gutjnl-2022-BSG.370).
Record type:
Meeting abstract
Abstract
Introduction: clinical natural language processing (NLP) techniques are evolving, such that over the next few years they will start to support clinicians to interpret clinical information. Named entity recognition and linkage (NER+L) to standard ontologies with millions of concepts, such as The Unified Medical Language System (UMLS) add value to otherwise unstructured textual data. However, little research has been done in the field of Inflammatory Bowel Disease (IBD).
Methods: anonymised GP referral letters triaged between 1st January 2017 to 31st March 2021 using an agreed protocol by a panel of Gastroenterologists as likely new or recurrent IBD were randomly extracted. NLP in python was applied to referral free text using MedCAT, a model trained on the UMLS database.
Manual validation was performed to determine sensitivity vs ground truth for finding positive mentions of four cardinal clinical signs and symptoms. Sensitivity = TP/(TP + FN) was the outcome of greatest interest. Chi2 was used for statistical comparison at the p<0.05 level.
Results: 125 referral letters were included in this study. Median age: 39(IQR:[30-50]), 51.2% Male[95%CI:42.4-60.1]. 22.4%(n=28) of the cohort had pre-existing IBD. Table 1 summarises the performance of the algorithm against the correct human validations: [P319 Table 1 NLP model outcome parameters not included]. Diarrhoea and abdominal pain were both most mentioned and most successfully detected by MedCAT, however, significant differences were flagged in all cases.
Conclusions: significant differences were observed between human validations and model predictions for four common IBD signs and symptoms, suggesting that these models are not yet mature enough for use in clinical practice. Annotations for more difficult concepts, such as rectal bleeding and weight loss need to be improved in major open-source NLP corpora.
This record has no associated files available for download.
More information
e-pub ahead of print date: 19 June 2022
Identifiers
Local EPrints ID: 478009
URI: http://eprints.soton.ac.uk/id/eprint/478009
ISSN: 1468-3288
PURE UUID: 4a61ae8d-db31-430f-b7ca-28989f6dbfed
Catalogue record
Date deposited: 19 Jun 2023 16:55
Last modified: 21 Sep 2024 02:15
Export record
Altmetrics
Contributors
Author:
Matt Stammers
Author:
Michael George
Author:
Constantinos Regas
Author:
Georgia May
Author:
Sohail Rahmany
Author:
Cai Davis
Author:
Florina Borca
Author:
Will Knibbs
Author:
Vishnu V. Chandrabalan
Author:
Markus Gwiggner
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics