[Unknown type: UNSPECIFIED]
Abstract
Background: inflammatory bowel disease (IBD) research is a dynamic field. However, the growing volume of electronic health records (EHRs) and research data presents significant challenges. Traditional methods for structuring unstructured medical records are labour-intensive and lack scalability. Large language models (LLMs) may present a solution, yet their usefulness in data standardisation in the context of IBD remains unknown.
Objective: to evaluate the use of LLMs in structuring free-text histology and radiology reports from IBD patients, compare their performance to manual clinician curation, and assess the usefulness of fine-tuning and retrieval-augmented generation (RAG).
Design: we developed an IBD-specialised LLM-based framework utilising structured prompt engineering and fine-tuning. Reports were manually curated and processed using various LLMs. Performance was assessed and RAG was used to enhance model responses with clinical guidelines from European Crohn’s and Colitis Organisation (ECCO) and the European Society for Paediatric Gastroenterology Hepatology and Nutrition (ESPGHAN).
Results: overall, Llama 3.3 achieved the highest F1 for histology and imaging (1 ± 0 and 0.85 ± 0.29, respectively) in extracting findings and anatomical regions, surpassing other models in structured data generation. Fine-tuning improved the performance of the smaller Llama 3.1 8B model for imaging reports (0.7 ± 0.46 vs 0.82 ± 0.35), enabling better extraction with reduced computational requirements.
Conclusion: our findings demonstrate the feasibility of LLM-based automated structuring of IBD-related medical records. Unstructured data from free text reports can be reliably converted to standardised ontologies with location, severity, and qualifiers. These advancements enable scalable, privacy-compliant AI-driven solutions for data standardisation.
More information
Identifiers
Catalogue record
Export record
Altmetrics
Contributors
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
