Development of childhood asthma prediction models using machine learning and data integration

Kothalawala, Dilini Mahesha (2021) Development of childhood asthma prediction models using machine learning and data integration. University of Southampton, Doctoral Thesis, 285pp.

Record type: Thesis (Doctoral)

Abstract

Childhood asthma is a chronic respiratory disease with substantial heterogeneity in its pathophysiology, presentation, trajectory and risk factors, particularly in early life. With the difficulty of obtaining an objective diagnosis before the age of five, the ability to predict childhood asthma could facilitate the identification of high-risk children, reduce misdiagnoses of probable asthmatics or encourage the implementation of primary prevention strategies and personalised asthma management. To promote the prediction of childhood asthma, a systematic review of existing prognostic prediction models for childhood asthma was conducted and demonstrated that current models have mainly been developed using traditional regression-based methods, with few independently validated and none being used in routine clinical practice. With the exploration of regression-based methods suggested to have been exhausted, this thesis aimed to explore novel approaches of data integration to improve current childhood asthma predictions using machine learning methods.
Using data from the Isle of Wight Birth Cohort (IOWBC, n=1456), the Childhood Asthma Prediction in Early-life (CAPE) and Childhood Asthma Prediction at Preschool-age (CAPP) models were developed to predict school-age asthma at 10 years using state-of-the-art machine learning methods. The CAPE and CAPP models used clinical and environmental data available from the first two year and first four years of life, respectively. Genome-wide genotype and methylation data were used to develop a polygenic risk score (PRS) and two novel methylation risk scores (MRS) (a newborn MRS, nMRS, and childhood MRS, cMRS) to predict childhood asthma, respectively. These genomic models were subsequently incorporated with the CAPE and CAPP models using a step-wise approach. The generalisability of all developed models was evaluated using data from the Manchester Asthma and Allergy Study (MAAS).
The CAPE and CAPP models demonstrated superior performance against their respective benchmark regression-based models based on area under the curve, with the CAPP model also surpassing the current best performing validated model, the Paediatric Asthma Risk Score (AUC: CAPE=0.71 vs. 0.64, CAPP=0.82 vs. PARS=0.80). The models offered good generalisability in MAAS and offered excellent sensitivity to predict a subgroup of individuals presenting with a persistent wheeze phenotype. Individually, the PRS and novel MRSs demonstrated moderate predictive ability (AUC: PRS=0.64, nMRS=0.61, cMRS=0.61). The integration of these genomic risk scores with the CAPE and CAPP models showed marginal improvement in performance (integrated CAPE=0.75, integrated CAPP=0.84). Overall, the incorporation of genetic and epigenetic data to predict the broad phenotype of asthma offered limited predictive improvement.
Using machine learning approaches, the CAPE and CAPP models were able to improve upon the current regression-based models for the prediction of childhood asthma. Coupled with the excellent sensitivity of the CAPE and CAPP models to predict a subgroup of individuals presenting with a persistent wheeze phenotype, this thesis suggests further exploration of the utility of machine learning methods focused on predicting asthma endotypes is warranted.

Text

Development of Childhood Asthma Prediction Models using Machine Learning and Data Integration - Version of Record

Available under License University of Southampton Thesis Licence.

Download (6MB)

Text

Permission_to_deposit_thesis_form_signed_TAN

Restricted to Repository staff only