The University of Southampton
University of Southampton Institutional Repository

Topological data analysis identifies molecular phenotypes of idiopathic pulmonary fibrosis

Topological data analysis identifies molecular phenotypes of idiopathic pulmonary fibrosis
Topological data analysis identifies molecular phenotypes of idiopathic pulmonary fibrosis

BACKGROUND: Idiopathic pulmonary fibrosis (IPF) is a debilitating, progressive disease with a median survival time of 3-5 years. Diagnosis remains challenging and disease progression varies greatly, suggesting the possibility of distinct subphenotypes.

METHODS AND RESULTS: We analysed publicly available peripheral blood mononuclear cell expression datasets for 219 IPF, 411 asthma, 362 tuberculosis, 151 healthy, 92 HIV and 83 other disease samples, totalling 1318 patients. We integrated the datasets and split them into train (n=871) and test (n=477) cohorts to investigate the utility of a machine learning model (support vector machine) for predicting IPF. A panel of 44 genes predicted IPF in a background of healthy, tuberculosis, HIV and asthma with an area under the curve of 0.9464, corresponding to a sensitivity of 0.865 and a specificity of 0.89. We then applied topological data analysis to investigate the possibility of subphenotypes within IPF. We identified five molecular subphenotypes of IPF, one of which corresponded to a phenotype enriched for death/transplant. The subphenotypes were molecularly characterised using bioinformatic and pathway analysis tools identifying distinct subphenotype features including one which suggests an extrapulmonary or systemic fibrotic disease.

CONCLUSIONS: Integration of multiple datasets, from the same tissue, enabled the development of a model to accurately predict IPF using a panel of 44 genes. Furthermore, topological data analysis identified distinct subphenotypes of patients with IPF which were defined by differences in molecular pathobiology and clinical characteristics.

idiopathic pulmonary fibrosis
0040-6376
682-689
Shapanis, Andrew
98b07884-92a9-4c00-afad-12194e339cbc
Jones, Mark G
a6fd492e-058e-4e84-a486-34c6035429c1
Schofield, James
529d3c88-857e-4431-93c2-e76577377ba7
Skipp, Paul
1ba7dcf6-9fe7-4b5c-a9d0-e32ed7f42aa5
Shapanis, Andrew
98b07884-92a9-4c00-afad-12194e339cbc
Jones, Mark G
a6fd492e-058e-4e84-a486-34c6035429c1
Schofield, James
529d3c88-857e-4431-93c2-e76577377ba7
Skipp, Paul
1ba7dcf6-9fe7-4b5c-a9d0-e32ed7f42aa5

Shapanis, Andrew, Jones, Mark G, Schofield, James and Skipp, Paul (2023) Topological data analysis identifies molecular phenotypes of idiopathic pulmonary fibrosis. Thorax, 78 (7), 682-689. (doi:10.1136/thorax-2022-219731).

Record type: Article

Abstract

BACKGROUND: Idiopathic pulmonary fibrosis (IPF) is a debilitating, progressive disease with a median survival time of 3-5 years. Diagnosis remains challenging and disease progression varies greatly, suggesting the possibility of distinct subphenotypes.

METHODS AND RESULTS: We analysed publicly available peripheral blood mononuclear cell expression datasets for 219 IPF, 411 asthma, 362 tuberculosis, 151 healthy, 92 HIV and 83 other disease samples, totalling 1318 patients. We integrated the datasets and split them into train (n=871) and test (n=477) cohorts to investigate the utility of a machine learning model (support vector machine) for predicting IPF. A panel of 44 genes predicted IPF in a background of healthy, tuberculosis, HIV and asthma with an area under the curve of 0.9464, corresponding to a sensitivity of 0.865 and a specificity of 0.89. We then applied topological data analysis to investigate the possibility of subphenotypes within IPF. We identified five molecular subphenotypes of IPF, one of which corresponded to a phenotype enriched for death/transplant. The subphenotypes were molecularly characterised using bioinformatic and pathway analysis tools identifying distinct subphenotype features including one which suggests an extrapulmonary or systemic fibrotic disease.

CONCLUSIONS: Integration of multiple datasets, from the same tissue, enabled the development of a model to accurately predict IPF using a panel of 44 genes. Furthermore, topological data analysis identified distinct subphenotypes of patients with IPF which were defined by differences in molecular pathobiology and clinical characteristics.

Text
thorax-2022-219731.full - Version of Record
Available under License Creative Commons Attribution.
Download (3MB)

More information

Accepted/In Press date: 19 January 2023
e-pub ahead of print date: 20 February 2023
Published date: 1 July 2023
Additional Information: © Author(s) (or their employer(s)) 2023. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ.
Keywords: idiopathic pulmonary fibrosis

Identifiers

Local EPrints ID: 476706
URI: http://eprints.soton.ac.uk/id/eprint/476706
ISSN: 0040-6376
PURE UUID: 72e2e274-6e13-4db4-b501-6cc28672ad3d
ORCID for Andrew Shapanis: ORCID iD orcid.org/0000-0003-4147-6956
ORCID for Mark G Jones: ORCID iD orcid.org/0000-0001-6308-6014
ORCID for Paul Skipp: ORCID iD orcid.org/0000-0002-2995-2959

Catalogue record

Date deposited: 11 May 2023 16:59
Last modified: 17 Mar 2024 03:58

Export record

Altmetrics

Contributors

Author: Andrew Shapanis ORCID iD
Author: Mark G Jones ORCID iD
Author: James Schofield
Author: Paul Skipp ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×