The University of Southampton
University of Southampton Institutional Repository

Learning from the past, structuring the future: using large language models to unlock a century of paediatric research in Archives of Disease in Childhood

Learning from the past, structuring the future: using large language models to unlock a century of paediatric research in Archives of Disease in Childhood
Learning from the past, structuring the future: using large language models to unlock a century of paediatric research in Archives of Disease in Childhood
Background and aims: the centenary of Archives of Disease in Childhood (ADC) presents an opportunity to reflect on a century of paediatric research and consider how best to leverage this ever-growing repository for future use. While content is indexed via PubMed and medical subject headings terms, this provides a superficial representation of complex journal content, leading to limited accessibility. We discuss the potential utility of large language models (LLMs)—advanced artificial intelligence systems that can understand, summarise and generate human-like language—and demonstrate their feasibility for structuring historical ADC articles, proposing a future pipeline to enhance indexing, retrieval and discoverability.

Methods: for demonstrative purposes, five articles from ADC December 1999 issue were locally downloaded and processed using a closed deployment of an LLM, Mistral (V.0.3, 7B). A structured prompt was used to extract key metadata. Outputs were manually compared with source texts and scored for accuracy. Hallucinations, fabricated or incorrect outputs, were recorded.

Results: the LLM achieved a mean accuracy of 86.9%, aligning with previous benchmarks for medical research assistance. No hallucinations were identified. Some repetition and verbosity were noted, likely due to chunk-based processing, but key fields were accurately extracted when explicitly present.

Conclusion: ADC holds a vast but underutilised body of research. This article shows that lightweight, locally hosted LLMs could structure ADC content without compromising intellectual property. Such methods could enable improved access, support automation of systematic reviews and enhance discoverability through biomedical ontologies, laying the foundation for a searchable, semantically enriched Archives that bridges historical insight with modern research needs.
Child Health, Paediatrics, Technology
0003-9888
Green, Zachary
b3269022-c0a6-42db-859d-d92c4cc5f4f0
Beattie, Robert M
9a66af0b-f81c-485c-b01d-519403f0038a
Ashton, James
03369017-99b5-40ae-9a43-14c98516f37d
Green, Zachary
b3269022-c0a6-42db-859d-d92c4cc5f4f0
Beattie, Robert M
9a66af0b-f81c-485c-b01d-519403f0038a
Ashton, James
03369017-99b5-40ae-9a43-14c98516f37d

Green, Zachary, Beattie, Robert M and Ashton, James (2025) Learning from the past, structuring the future: using large language models to unlock a century of paediatric research in Archives of Disease in Childhood. Archives of Disease in Childhood, [archdischild-2025-329505]. (doi:10.1136/archdischild-2025-329505).

Record type: Article

Abstract

Background and aims: the centenary of Archives of Disease in Childhood (ADC) presents an opportunity to reflect on a century of paediatric research and consider how best to leverage this ever-growing repository for future use. While content is indexed via PubMed and medical subject headings terms, this provides a superficial representation of complex journal content, leading to limited accessibility. We discuss the potential utility of large language models (LLMs)—advanced artificial intelligence systems that can understand, summarise and generate human-like language—and demonstrate their feasibility for structuring historical ADC articles, proposing a future pipeline to enhance indexing, retrieval and discoverability.

Methods: for demonstrative purposes, five articles from ADC December 1999 issue were locally downloaded and processed using a closed deployment of an LLM, Mistral (V.0.3, 7B). A structured prompt was used to extract key metadata. Outputs were manually compared with source texts and scored for accuracy. Hallucinations, fabricated or incorrect outputs, were recorded.

Results: the LLM achieved a mean accuracy of 86.9%, aligning with previous benchmarks for medical research assistance. No hallucinations were identified. Some repetition and verbosity were noted, likely due to chunk-based processing, but key fields were accurately extracted when explicitly present.

Conclusion: ADC holds a vast but underutilised body of research. This article shows that lightweight, locally hosted LLMs could structure ADC content without compromising intellectual property. Such methods could enable improved access, support automation of systematic reviews and enhance discoverability through biomedical ontologies, laying the foundation for a searchable, semantically enriched Archives that bridges historical insight with modern research needs.

Text
ADC_CENTENARY_12_09_2025_REVISION_CLEAN - Accepted Manuscript
Download (203kB)

More information

Published date: 19 November 2025
Keywords: Child Health, Paediatrics, Technology

Identifiers

Local EPrints ID: 507752
URI: http://eprints.soton.ac.uk/id/eprint/507752
ISSN: 0003-9888
PURE UUID: 804c6991-e140-4cb8-896a-1b6a70ea0773
ORCID for Zachary Green: ORCID iD orcid.org/0000-0002-2907-5538
ORCID for James Ashton: ORCID iD orcid.org/0000-0003-0348-8198

Catalogue record

Date deposited: 06 Jan 2026 11:02
Last modified: 08 Jan 2026 03:25

Export record

Altmetrics

Contributors

Author: Zachary Green ORCID iD
Author: Robert M Beattie
Author: James Ashton ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×