Integration of health informatics: ‘big data’ for clinical translation in inflammatory bowel disease

Inflammatory bowel disease (IBD) is a chronic, complex autoimmune disease characterised by relapsing-remitting gastrointestinal tract inflammation. It is considered to arise from interactions between an individual’s genetic susceptibility, environmental factors, immune dysregulation, and gut microbial dysbiosis. Genetics can make a larger contribution to IBD pathology in some patients, and this is thought to be linked to age of diagnosis, with genetic factors having the largest effects in very young children. There are two main subtypes of IBD: ulcerative colitis (UC) and Crohn’s disease (CD). Within subtypes, there are different disease behaviours and severities. One particular disease behaviour of interest is the stricturing endotype, which causes a narrowing of the gastrointestinal tract that often requires surgery. This thesis first examines oxidative stress in IBD patients, through the use of assay data. Here, statistical and machine learning (ML) methods are employed to examine the relationship between clinical and genomic characteristics of a set of paediatric patients, and their measured oxidative stress and antioxidant potential. In this work, no results suggested that these assay data could be used as an indicator for these clinical features, or for pathogenic variation in key oxidative stress genes. The predominant focus of this thesis is the use of genomic data and ML to stratify IBD patients. In order to prepare genomic data for use in ML pipelines, the GenePy algorithm was used. GenePy takes in information regarding zygosity, allele frequency, and predicted deleteriousness for every variant in a gene. The scores for each variant are summed to create an overall gene score, and this becomes are per-gene, per-individual matrix of scores. The two clinical problems analysed here were classifying IBD patients according to their subtype, and stratifying CD patients by the presence or absence of a stricturing endotype. This was achieved with an ML random forest classifier. Optimisation of both the input data and ML algorithm for these classifications was a important aspect of this work. Several gene panels were trialled for these classifications, and an autoimmune gene panel outperformed an IBD gene panel for determining IBD subtype. Stratifying CD patients by their stricturing endotype was subsequently performed with a random survival forest, which combined a random forest with survival analysis methods. This method is better suited to the longitudinal nature of stricturing endotype developed. This work demonstrated challenges that arise from the sparsity of genomic data, and required the development of a pipeline that could reduce the sparsity of the features used by the ML algorithm. The patient stratification performed here demonstrated strong evidence for the presence of different genomic variation patterns within IBD subtypes, and within the CD stricturing endotype. With increased dataset sizes, it may be possible to more clearly detect and cluster patients according to their genomic variation. In order to take full advantage of this knowledge, there is an additional requirement for deep, varied and longitudinal clinical data. Then, genomic data can guide each patient’s clinical pathway, providing individuals with more personalised, life-long care.

Machine Learning, Inflammatory bowel disease, Genomics

University of Southampton

Stafford, Imogen S.

50987dc1-3772-408f-9093-9124f3d6b2cd

September 2023

Stafford, Imogen S.

50987dc1-3772-408f-9093-9124f3d6b2cd

Ennis, Sarah

7b57f188-9d91-4beb-b217-09856146f1e9

Mossotto, Enrico

a2a572db-3e95-41c6-94f6-f1b019594372

Beattie, Robert M

9a66af0b-f81c-485c-b01d-519403f0038a

Macarthur, Benjamin

2c0476e7-5d3e-4064-81bb-104e8e88bb6b

Stafford, Imogen S. (2023) Integration of health informatics: ‘big data’ for clinical translation in inflammatory bowel disease. University of Southampton, Doctoral Thesis, 396pp.

Record type: Thesis (Doctoral)

Abstract

Text

Doctoral Thesis for Imogen Stafford PDFA - Integration_of_health_informatics_big_data_for_clinical_translation_in_IBD - Version of Record

Available under License University of Southampton Thesis Licence.

Download (12MB)

Text

Final-thesis-submission-Examination-Miss-Imogen-Stafford

Restricted to Repository staff only

Available under License University of Southampton Thesis Licence.

More information

Submitted date: June 2023

Published date: September 2023

Related URLs:

Keywords: Machine Learning, Inflammatory bowel disease, Genomics

Identifiers

Local EPrints ID: 482291

URI: http://eprints.soton.ac.uk/id/eprint/482291

PURE UUID: 4f5190ec-2fbd-4faf-8784-09081610d974

ORCID for Imogen S. Stafford:

orcid.org/0000-0003-1666-1906

ORCID for Sarah Ennis:

orcid.org/0000-0003-2648-0869

ORCID for Benjamin Macarthur:

orcid.org/0000-0002-5396-9750

Catalogue record

Date deposited: 26 Sep 2023 16:36

Last modified: 18 Mar 2024 03:49

Export record

Share this record

Share this on Facebook Share this on Twitter Share this on Weibo

Contributors

Author: Imogen S. Stafford

Thesis advisor: Sarah Ennis

Thesis advisor: Enrico Mossotto

Thesis advisor: Robert M Beattie

Thesis advisor: Benjamin Macarthur

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Library staff additional information