A foundation systematic review of natural language processing applied to gastroenterology & hepatology
A foundation systematic review of natural language processing applied to gastroenterology & hepatology
Objective: this review assesses the progress of NLP in gastroenterology to date, grades the robustness of the methodology, exposes the field to a new generation of authors, and highlights opportunities for future research.
Design: seven scholarly databases (ACM Digital Library, Arxiv, Embase, IEEE Explore, Pubmed, Scopus and Google Scholar) were searched for studies published between 2015 and 2023 that met the inclusion criteria. Studies lacking a description of appropriate validation or NLP methods were excluded, as were studies ufinavailable in English, those focused on non-gastrointestinal diseases and those that were duplicates. Two independent reviewers extracted study information, clinical/algorithm details, and relevant outcome data. Methodological quality and bias risks were appraised using a checklist of quality indicators for NLP studies.
Results: fifty-three studies were identified utilising NLP in endoscopy, inflammatory bowel disease, gastrointestinal bleeding, liver and pancreatic disease. Colonoscopy was the focus of 21 (38.9%) studies; 13 (24.1%) focused on liver disease, 7 (13.0%) on inflammatory bowel disease, 4 (7.4%) on gastroscopy, 4 (7.4%) on pancreatic disease and 2 (3.7%) on endoscopic sedation/ERCP and gastrointestinal bleeding. Only 30 (56.6%) of the studies reported patient demographics, and only 13 (24.5%) had a low risk of validation bias. Thirty-five (66%) studies mentioned generalisability, but only 5 (9.4%) mentioned explainability or shared code/models.
Conclusions: NLP can unlock substantial clinical information from free-text notes stored in EPRs and is already being used, particularly to interpret colonoscopy and radiology reports. However, the models we have thus far lack transparency, leading to duplication, bias, and doubts about generalisability. Therefore, greater clinical engagement, collaboration, and open sharing of appropriate datasets and code are needed.
Colonoscopy, Gastroscopy, Hepatocellular carcinoma, Inflammatory bowel disease, Natural language Processing, Pancreatic disease
Stammers, Matthew
a4ad3bd5-7323-4a6d-9c00-2c34f8ae5bd3
Ramgopal, Balasubramanian
9e4ce3e5-40a3-44e0-a372-fc6bd888a4f2
Nimako, Abigail Owusu
a43f120e-81f9-425f-8665-d87291b544c2
Batchelor, James
e53c36c7-aa7f-4fae-8113-30bfbb9b36ee
Vyas, Anand
d7e8abbe-515b-4785-9d9c-42c0043f158f
Nouraei, Reza
f09047ee-ed51-495d-a257-11837e74c2b3
Metcalf, Cheryl
09a47264-8bd5-43bd-a93e-177992c22c72
Bachelor, James
02322e8d-10e3-4c52-bb5b-ba42e6011b60
Shepherd, Jonathan
dfbca97a-9307-4eee-bdf7-e27bcb02bc67
Gwiggner, Markus
af72b597-1ead-4155-a25c-0835f7e560c2
Stammers, Matthew
a4ad3bd5-7323-4a6d-9c00-2c34f8ae5bd3
Ramgopal, Balasubramanian
9e4ce3e5-40a3-44e0-a372-fc6bd888a4f2
Nimako, Abigail Owusu
a43f120e-81f9-425f-8665-d87291b544c2
Batchelor, James
e53c36c7-aa7f-4fae-8113-30bfbb9b36ee
Vyas, Anand
d7e8abbe-515b-4785-9d9c-42c0043f158f
Nouraei, Reza
f09047ee-ed51-495d-a257-11837e74c2b3
Metcalf, Cheryl
09a47264-8bd5-43bd-a93e-177992c22c72
Bachelor, James
02322e8d-10e3-4c52-bb5b-ba42e6011b60
Shepherd, Jonathan
dfbca97a-9307-4eee-bdf7-e27bcb02bc67
Gwiggner, Markus
af72b597-1ead-4155-a25c-0835f7e560c2
Stammers, Matthew, Ramgopal, Balasubramanian, Nimako, Abigail Owusu, Batchelor, James, Vyas, Anand, Nouraei, Reza, Metcalf, Cheryl, Bachelor, James, Shepherd, Jonathan and Gwiggner, Markus
(2025)
A foundation systematic review of natural language processing applied to gastroenterology & hepatology.
BMC Gastroenterology, 25 (1), [58].
(doi:10.1186/s12876-025-03608-5).
Abstract
Objective: this review assesses the progress of NLP in gastroenterology to date, grades the robustness of the methodology, exposes the field to a new generation of authors, and highlights opportunities for future research.
Design: seven scholarly databases (ACM Digital Library, Arxiv, Embase, IEEE Explore, Pubmed, Scopus and Google Scholar) were searched for studies published between 2015 and 2023 that met the inclusion criteria. Studies lacking a description of appropriate validation or NLP methods were excluded, as were studies ufinavailable in English, those focused on non-gastrointestinal diseases and those that were duplicates. Two independent reviewers extracted study information, clinical/algorithm details, and relevant outcome data. Methodological quality and bias risks were appraised using a checklist of quality indicators for NLP studies.
Results: fifty-three studies were identified utilising NLP in endoscopy, inflammatory bowel disease, gastrointestinal bleeding, liver and pancreatic disease. Colonoscopy was the focus of 21 (38.9%) studies; 13 (24.1%) focused on liver disease, 7 (13.0%) on inflammatory bowel disease, 4 (7.4%) on gastroscopy, 4 (7.4%) on pancreatic disease and 2 (3.7%) on endoscopic sedation/ERCP and gastrointestinal bleeding. Only 30 (56.6%) of the studies reported patient demographics, and only 13 (24.5%) had a low risk of validation bias. Thirty-five (66%) studies mentioned generalisability, but only 5 (9.4%) mentioned explainability or shared code/models.
Conclusions: NLP can unlock substantial clinical information from free-text notes stored in EPRs and is already being used, particularly to interpret colonoscopy and radiology reports. However, the models we have thus far lack transparency, leading to duplication, bias, and doubts about generalisability. Therefore, greater clinical engagement, collaboration, and open sharing of appropriate datasets and code are needed.
Text
s12876-025-03608-5
- Version of Record
More information
e-pub ahead of print date: 6 February 2025
Keywords:
Colonoscopy, Gastroscopy, Hepatocellular carcinoma, Inflammatory bowel disease, Natural language Processing, Pancreatic disease
Identifiers
Local EPrints ID: 499004
URI: http://eprints.soton.ac.uk/id/eprint/499004
ISSN: 1471-230X
PURE UUID: 51c4d228-9bd4-4d42-bf26-c69ea64551f4
Catalogue record
Date deposited: 06 Mar 2025 17:54
Last modified: 22 Aug 2025 02:45
Export record
Altmetrics
Contributors
Author:
Matthew Stammers
Author:
Balasubramanian Ramgopal
Author:
Abigail Owusu Nimako
Author:
Anand Vyas
Author:
Reza Nouraei
Author:
James Bachelor
Author:
Markus Gwiggner
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics