The University of Southampton
University of Southampton Institutional Repository

An overview of the National COVID-19 Chest Imaging Database: data quality and cohort analysis

An overview of the National COVID-19 Chest Imaging Database: data quality and cohort analysis
An overview of the National COVID-19 Chest Imaging Database: data quality and cohort analysis

Background: the National COVID-19 Chest Imaging Database (NCCID) is a centralized database containing mainly chest X-rays and computed tomography scans from patients across the UK. The objective of the initiative is to support a better understanding of the coronavirus SARS-CoV-2 disease (COVID-19) and the development of machine learning technologies that will improve care for patients hospitalized with a severe COVID-19 infection. This article introduces the training dataset, including a snapshot analysis covering the completeness of clinical data, and availability of image data for the various use-cases (diagnosis, prognosis, longitudinal risk). An additional cohort analysis measures how well the NCCID represents the wider COVID-19-affected UK population in terms of geographic, demographic, and temporal coverage.

Findings: the NCCID offers high-quality DICOM images acquired across a variety of imaging machinery; multiple time points including historical images are available for a subset of patients. This volume and variety make the database well suited to development of diagnostic/prognostic models for COVID-associated respiratory conditions. Historical images and clinical data may aid long-term risk stratification, particularly as availability of comorbidity data increases through linkage to other resources. The cohort analysis revealed good alignment to general UK COVID-19 statistics for some categories, e.g., sex, whilst identifying areas for improvements to data collection methods, particularly geographic coverage.

Conclusions: the NCCID is a growing resource that provides researchers with a large, high-quality database that can be leveraged both to support the response to the COVID-19 pandemic and as a test bed for building clinically viable medical imaging models.

COVID-19, Cohort Studies, Data Accuracy, Humans, Pandemics, SARS-CoV-2, Tomography, X-Ray Computed
2047-217X
Cushnan, Dominic
cac5e9d6-f646-4427-ba4d-ff73ce34bbdd
Bennett, Oscar
b6863a59-6564-42fe-b470-a4c096734805
Berka, Rosalind
2e150006-6ade-404d-919d-3279306e492e
Bertolli, Ottavia
de1582e2-03ed-41f2-9ed1-a1f620776201
Chopra, Ashwin
1204ace8-c0fd-421f-93d1-6ef4c64d0a97
Dorgham, Samie
cf61aa5f-282e-4ebe-9c7c-fd87116ff417
Favaro, Alberto
c4c7434e-08be-4884-ba52-2dae03f6c0b3
Ganepola, Tara
1e13744e-ea6a-41e8-bce3-c72e27ef13a4
Halling-Brown, Mark
8e9923a8-9609-4e8d-b4fa-82e01351525c
Imreh, Gergely
a42fc6c0-2df5-4f6e-8c3c-224e9e883cb0
Jacob, Joseph
6c91e5d1-17a7-49d1-810d-6418f277c3ed
Jefferson, Emily
436e61e6-1c4a-442b-87ef-643a0cfcb317
Lemarchand, François
1b5f4ecc-2e6a-4f42-a6f6-486559997e05
Schofield, Daniel
f03ab42d-e56b-45d8-b885-1541725b308d
Wyatt, Jeremy C
8361be5a-fca9-4acf-b3d2-7ce04126f468
NCCID Collaborative
Cushnan, Dominic
cac5e9d6-f646-4427-ba4d-ff73ce34bbdd
Bennett, Oscar
b6863a59-6564-42fe-b470-a4c096734805
Berka, Rosalind
2e150006-6ade-404d-919d-3279306e492e
Bertolli, Ottavia
de1582e2-03ed-41f2-9ed1-a1f620776201
Chopra, Ashwin
1204ace8-c0fd-421f-93d1-6ef4c64d0a97
Dorgham, Samie
cf61aa5f-282e-4ebe-9c7c-fd87116ff417
Favaro, Alberto
c4c7434e-08be-4884-ba52-2dae03f6c0b3
Ganepola, Tara
1e13744e-ea6a-41e8-bce3-c72e27ef13a4
Halling-Brown, Mark
8e9923a8-9609-4e8d-b4fa-82e01351525c
Imreh, Gergely
a42fc6c0-2df5-4f6e-8c3c-224e9e883cb0
Jacob, Joseph
6c91e5d1-17a7-49d1-810d-6418f277c3ed
Jefferson, Emily
436e61e6-1c4a-442b-87ef-643a0cfcb317
Lemarchand, François
1b5f4ecc-2e6a-4f42-a6f6-486559997e05
Schofield, Daniel
f03ab42d-e56b-45d8-b885-1541725b308d
Wyatt, Jeremy C
8361be5a-fca9-4acf-b3d2-7ce04126f468

Cushnan, Dominic, Bennett, Oscar and Berka, Rosalind , NCCID Collaborative (2021) An overview of the National COVID-19 Chest Imaging Database: data quality and cohort analysis. GigaScience, 10 (11). (doi:10.1093/gigascience/giab076).

Record type: Article

Abstract

Background: the National COVID-19 Chest Imaging Database (NCCID) is a centralized database containing mainly chest X-rays and computed tomography scans from patients across the UK. The objective of the initiative is to support a better understanding of the coronavirus SARS-CoV-2 disease (COVID-19) and the development of machine learning technologies that will improve care for patients hospitalized with a severe COVID-19 infection. This article introduces the training dataset, including a snapshot analysis covering the completeness of clinical data, and availability of image data for the various use-cases (diagnosis, prognosis, longitudinal risk). An additional cohort analysis measures how well the NCCID represents the wider COVID-19-affected UK population in terms of geographic, demographic, and temporal coverage.

Findings: the NCCID offers high-quality DICOM images acquired across a variety of imaging machinery; multiple time points including historical images are available for a subset of patients. This volume and variety make the database well suited to development of diagnostic/prognostic models for COVID-associated respiratory conditions. Historical images and clinical data may aid long-term risk stratification, particularly as availability of comorbidity data increases through linkage to other resources. The cohort analysis revealed good alignment to general UK COVID-19 statistics for some categories, e.g., sex, whilst identifying areas for improvements to data collection methods, particularly geographic coverage.

Conclusions: the NCCID is a growing resource that provides researchers with a large, high-quality database that can be leveraged both to support the response to the COVID-19 pandemic and as a test bed for building clinically viable medical imaging models.

Text
giab076 - Version of Record
Available under License Creative Commons Attribution.
Download (3MB)

More information

e-pub ahead of print date: 25 November 2021
Additional Information: Funding The NCCID is publicly funded by NHSX. J.J. was supported by a Wellcome Trust Clinical Research Career Development Fellowship (209553/Z/17/Z) and by the NIHR BRC at UCL.
Keywords: COVID-19, Cohort Studies, Data Accuracy, Humans, Pandemics, SARS-CoV-2, Tomography, X-Ray Computed

Identifiers

Local EPrints ID: 454527
URI: http://eprints.soton.ac.uk/id/eprint/454527
ISSN: 2047-217X
PURE UUID: daeac001-b8f7-495e-a9d6-8899584ee34e
ORCID for Jeremy C Wyatt: ORCID iD orcid.org/0000-0001-7008-1473

Catalogue record

Date deposited: 15 Feb 2022 17:38
Last modified: 17 Mar 2024 03:40

Export record

Altmetrics

Contributors

Author: Dominic Cushnan
Author: Oscar Bennett
Author: Rosalind Berka
Author: Ottavia Bertolli
Author: Ashwin Chopra
Author: Samie Dorgham
Author: Alberto Favaro
Author: Tara Ganepola
Author: Mark Halling-Brown
Author: Gergely Imreh
Author: Joseph Jacob
Author: Emily Jefferson
Author: François Lemarchand
Author: Daniel Schofield
Author: Jeremy C Wyatt ORCID iD
Corporate Author: NCCID Collaborative

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×