The University of Southampton
University of Southampton Institutional Repository

Systematic review of natural language processing applied to gastroenterology & hepatology: the current state of the art

Systematic review of natural language processing applied to gastroenterology & hepatology: the current state of the art
Systematic review of natural language processing applied to gastroenterology & hepatology: the current state of the art
Objective:

This review assesses the progress of NLP in gastroenterology to date, grades the robustness of the methodology, exposes the field to a new generation of authors, and highlights opportunities for future research.

Design:

Seven scholarly databases (ACM Digital Library, Arxiv, Embase, IEEE Explore, Pubmed, Scopus and Google Scholar) were searched for studies published 2015–2023 meeting inclusion criteria. Studies lacking a description of appropriate validation or NLP methods were excluded, as were studies unavailable in English, focused on non-gastrointestinal diseases and duplicates. Two independent reviewers extracted study information, clinical/algorithm details, and relevant outcome data. Methodological quality and bias risks were appraised using a checklist of quality indicators for NLP studies.

Results:

Fifty-three studies were identified utilising NLP in Endoscopy, Inflammatory Bowel Disease, Gastrointestinal Bleeding, Liver and Pancreatic Disease. Colonoscopy was the focus of 21(38.9%) studies, 13(24.1%) focused on liver disease, 7(13.0%) inflammatory bowel disease, 4(7.4%) on gastroscopy, 4(7.4%) on pancreatic disease and 2(3.7%) studies focused on endoscopic sedation/ERCP and gastrointestinal bleeding respectively. Only 30(56.6%) of studies reported any patient demographics, and only 13(24.5%) scored as low risk of validation bias. 35(66%) studies mentioned generalisability but only 5(9.4%) mentioned explainability or shared code/models.

Conclusion:

NLP can unlock substantial clinical information from free-text notes stored in EPRs and is already being used, particularly to interpret colonoscopy and radiology reports. However, the models we have so far lack transparency, leading to duplication, bias, and doubts about generalisability. Therefore, greater clinical engagement, collaboration, and open sharing of appropriate datasets and code are needed.
Research Square
Stammers, Matthew
a4ad3bd5-7323-4a6d-9c00-2c34f8ae5bd3
Ramgopal, Balasubramanian
9e4ce3e5-40a3-44e0-a372-fc6bd888a4f2
Obeng, Abigail
231cf799-6278-4e2d-a74d-ec62115e81a6
Vyas, Anand
d7e8abbe-515b-4785-9d9c-42c0043f158f
Nouraei, Reza
f09047ee-ed51-495d-a257-11837e74c2b3
Metcalf, Cheryl
95774dba-f27e-4bc6-bb7e-68a24f7ea051
Batchelor, James
e53c36c7-aa7f-4fae-8113-30bfbb9b36ee
Shepherd, Jonathan
d9e5a4ec-c429-449f-85ce-f30bf02d09bd
Gwiggner, Markus
af72b597-1ead-4155-a25c-0835f7e560c2
Stammers, Matthew
a4ad3bd5-7323-4a6d-9c00-2c34f8ae5bd3
Ramgopal, Balasubramanian
9e4ce3e5-40a3-44e0-a372-fc6bd888a4f2
Obeng, Abigail
231cf799-6278-4e2d-a74d-ec62115e81a6
Vyas, Anand
d7e8abbe-515b-4785-9d9c-42c0043f158f
Nouraei, Reza
f09047ee-ed51-495d-a257-11837e74c2b3
Metcalf, Cheryl
95774dba-f27e-4bc6-bb7e-68a24f7ea051
Batchelor, James
e53c36c7-aa7f-4fae-8113-30bfbb9b36ee
Shepherd, Jonathan
d9e5a4ec-c429-449f-85ce-f30bf02d09bd
Gwiggner, Markus
af72b597-1ead-4155-a25c-0835f7e560c2

[Unknown type: UNSPECIFIED]

Record type: UNSPECIFIED

Abstract

Objective:

This review assesses the progress of NLP in gastroenterology to date, grades the robustness of the methodology, exposes the field to a new generation of authors, and highlights opportunities for future research.

Design:

Seven scholarly databases (ACM Digital Library, Arxiv, Embase, IEEE Explore, Pubmed, Scopus and Google Scholar) were searched for studies published 2015–2023 meeting inclusion criteria. Studies lacking a description of appropriate validation or NLP methods were excluded, as were studies unavailable in English, focused on non-gastrointestinal diseases and duplicates. Two independent reviewers extracted study information, clinical/algorithm details, and relevant outcome data. Methodological quality and bias risks were appraised using a checklist of quality indicators for NLP studies.

Results:

Fifty-three studies were identified utilising NLP in Endoscopy, Inflammatory Bowel Disease, Gastrointestinal Bleeding, Liver and Pancreatic Disease. Colonoscopy was the focus of 21(38.9%) studies, 13(24.1%) focused on liver disease, 7(13.0%) inflammatory bowel disease, 4(7.4%) on gastroscopy, 4(7.4%) on pancreatic disease and 2(3.7%) studies focused on endoscopic sedation/ERCP and gastrointestinal bleeding respectively. Only 30(56.6%) of studies reported any patient demographics, and only 13(24.5%) scored as low risk of validation bias. 35(66%) studies mentioned generalisability but only 5(9.4%) mentioned explainability or shared code/models.

Conclusion:

NLP can unlock substantial clinical information from free-text notes stored in EPRs and is already being used, particularly to interpret colonoscopy and radiology reports. However, the models we have so far lack transparency, leading to duplication, bias, and doubts about generalisability. Therefore, greater clinical engagement, collaboration, and open sharing of appropriate datasets and code are needed.

Text
e70a664e-b68f-4195-9675-96e24d115e32 - Author's Original
Available under License Creative Commons Attribution.
Download (1MB)

More information

Published date: 19 April 2024

Identifiers

Local EPrints ID: 507746
URI: http://eprints.soton.ac.uk/id/eprint/507746
PURE UUID: 78629f76-c2b9-43d0-a99a-23305422ebda
ORCID for Matthew Stammers: ORCID iD orcid.org/0000-0003-3850-3116
ORCID for James Batchelor: ORCID iD orcid.org/0000-0002-5307-552X

Catalogue record

Date deposited: 06 Jan 2026 10:52
Last modified: 08 Jan 2026 03:26

Export record

Altmetrics

Contributors

Author: Matthew Stammers ORCID iD
Author: Balasubramanian Ramgopal
Author: Abigail Obeng
Author: Anand Vyas
Author: Reza Nouraei
Author: Cheryl Metcalf
Author: James Batchelor ORCID iD
Author: Jonathan Shepherd
Author: Markus Gwiggner

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×