Development of childhood asthma prediction models using machine learning and data integration
Development of childhood asthma prediction models using machine learning and data integration
Childhood asthma is a chronic respiratory disease with substantial heterogeneity in its pathophysiology, presentation, trajectory and risk factors, particularly in early life. With the difficulty of obtaining an objective diagnosis before the age of five, the ability to predict childhood asthma could facilitate the identification of high-risk children, reduce misdiagnoses of probable asthmatics or encourage the implementation of primary prevention strategies and personalised asthma management. To promote the prediction of childhood asthma, a systematic review of existing prognostic prediction models for childhood asthma was conducted and demonstrated that current models have mainly been developed using traditional regression-based methods, with few independently validated and none being used in routine clinical practice. With the exploration of regression-based methods suggested to have been exhausted, this thesis aimed to explore novel approaches of data integration to improve current childhood asthma predictions using machine learning methods.
Using data from the Isle of Wight Birth Cohort (IOWBC, n=1456), the Childhood Asthma Prediction in Early-life (CAPE) and Childhood Asthma Prediction at Preschool-age (CAPP) models were developed to predict school-age asthma at 10 years using state-of-the-art machine learning methods. The CAPE and CAPP models used clinical and environmental data available from the first two year and first four years of life, respectively. Genome-wide genotype and methylation data were used to develop a polygenic risk score (PRS) and two novel methylation risk scores (MRS) (a newborn MRS, nMRS, and childhood MRS, cMRS) to predict childhood asthma, respectively. These genomic models were subsequently incorporated with the CAPE and CAPP models using a step-wise approach. The generalisability of all developed models was evaluated using data from the Manchester Asthma and Allergy Study (MAAS).
The CAPE and CAPP models demonstrated superior performance against their respective benchmark regression-based models based on area under the curve, with the CAPP model also surpassing the current best performing validated model, the Paediatric Asthma Risk Score (AUC: CAPE=0.71 vs. 0.64, CAPP=0.82 vs. PARS=0.80). The models offered good generalisability in MAAS and offered excellent sensitivity to predict a subgroup of individuals presenting with a persistent wheeze phenotype. Individually, the PRS and novel MRSs demonstrated moderate predictive ability (AUC: PRS=0.64, nMRS=0.61, cMRS=0.61). The integration of these genomic risk scores with the CAPE and CAPP models showed marginal improvement in performance (integrated CAPE=0.75, integrated CAPP=0.84). Overall, the incorporation of genetic and epigenetic data to predict the broad phenotype of asthma offered limited predictive improvement.
Using machine learning approaches, the CAPE and CAPP models were able to improve upon the current regression-based models for the prediction of childhood asthma. Coupled with the excellent sensitivity of the CAPE and CAPP models to predict a subgroup of individuals presenting with a persistent wheeze phenotype, this thesis suggests further exploration of the utility of machine learning methods focused on predicting asthma endotypes is warranted.
University of Southampton
Kothalawala, Dilini Mahesha
c22b9e92-e60a-44b6-a34b-2eb37a3a1212
November 2021
Kothalawala, Dilini Mahesha
c22b9e92-e60a-44b6-a34b-2eb37a3a1212
Holloway, John
4bbd77e6-c095-445d-a36b-a50a72f6fe1a
Kothalawala, Dilini Mahesha
(2021)
Development of childhood asthma prediction models using machine learning and data integration.
University of Southampton, Doctoral Thesis, 285pp.
Record type:
Thesis
(Doctoral)
Abstract
Childhood asthma is a chronic respiratory disease with substantial heterogeneity in its pathophysiology, presentation, trajectory and risk factors, particularly in early life. With the difficulty of obtaining an objective diagnosis before the age of five, the ability to predict childhood asthma could facilitate the identification of high-risk children, reduce misdiagnoses of probable asthmatics or encourage the implementation of primary prevention strategies and personalised asthma management. To promote the prediction of childhood asthma, a systematic review of existing prognostic prediction models for childhood asthma was conducted and demonstrated that current models have mainly been developed using traditional regression-based methods, with few independently validated and none being used in routine clinical practice. With the exploration of regression-based methods suggested to have been exhausted, this thesis aimed to explore novel approaches of data integration to improve current childhood asthma predictions using machine learning methods.
Using data from the Isle of Wight Birth Cohort (IOWBC, n=1456), the Childhood Asthma Prediction in Early-life (CAPE) and Childhood Asthma Prediction at Preschool-age (CAPP) models were developed to predict school-age asthma at 10 years using state-of-the-art machine learning methods. The CAPE and CAPP models used clinical and environmental data available from the first two year and first four years of life, respectively. Genome-wide genotype and methylation data were used to develop a polygenic risk score (PRS) and two novel methylation risk scores (MRS) (a newborn MRS, nMRS, and childhood MRS, cMRS) to predict childhood asthma, respectively. These genomic models were subsequently incorporated with the CAPE and CAPP models using a step-wise approach. The generalisability of all developed models was evaluated using data from the Manchester Asthma and Allergy Study (MAAS).
The CAPE and CAPP models demonstrated superior performance against their respective benchmark regression-based models based on area under the curve, with the CAPP model also surpassing the current best performing validated model, the Paediatric Asthma Risk Score (AUC: CAPE=0.71 vs. 0.64, CAPP=0.82 vs. PARS=0.80). The models offered good generalisability in MAAS and offered excellent sensitivity to predict a subgroup of individuals presenting with a persistent wheeze phenotype. Individually, the PRS and novel MRSs demonstrated moderate predictive ability (AUC: PRS=0.64, nMRS=0.61, cMRS=0.61). The integration of these genomic risk scores with the CAPE and CAPP models showed marginal improvement in performance (integrated CAPE=0.75, integrated CAPP=0.84). Overall, the incorporation of genetic and epigenetic data to predict the broad phenotype of asthma offered limited predictive improvement.
Using machine learning approaches, the CAPE and CAPP models were able to improve upon the current regression-based models for the prediction of childhood asthma. Coupled with the excellent sensitivity of the CAPE and CAPP models to predict a subgroup of individuals presenting with a persistent wheeze phenotype, this thesis suggests further exploration of the utility of machine learning methods focused on predicting asthma endotypes is warranted.
Text
Development of Childhood Asthma Prediction Models using Machine Learning and Data Integration
- Version of Record
Text
Permission_to_deposit_thesis_form_signed_TAN
Restricted to Repository staff only
More information
Published date: November 2021
Identifiers
Local EPrints ID: 474335
URI: http://eprints.soton.ac.uk/id/eprint/474335
PURE UUID: cb427d48-3004-4179-94f1-55520a23c1e2
Catalogue record
Date deposited: 20 Feb 2023 17:50
Last modified: 17 Mar 2024 02:45
Export record
Contributors
Author:
Dilini Mahesha Kothalawala
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics