The University of Southampton
University of Southampton Institutional Repository

Advanced modelling of genomic data in Inflammatory Bowel Disease

Advanced modelling of genomic data in Inflammatory Bowel Disease
Advanced modelling of genomic data in Inflammatory Bowel Disease
Advances in next generation sequencing technologies allow the collection of enormous volumes of genomic data on large patient cohorts. Concurrently, machine learning algorithms are rapidly evolving and, together, these technologies represent the new frontier of research and clinical management on a path leading toward personalised medicine.

The aims of this thesis are two. Firstly, to develop a mathematical framework for the analysis and integration of next generation sequencing data. Secondly, to model data from patients affected by inflammatory bowel disease (IBD), a common complex autoimmune condition with increasing incidence worldwide, by applying machine learning methodologies to clinical and transformed genomic data.

The analyses presented in this thesis are largely based on a cohort of paediatric IBD patients for which clinical data, immunology and whole exome sequencing data were available.

This research illustrates a supervised and unsupervised machine learning approach modelling histology and endoscopy data for assigning IBD patients with the correct CD/UC subtypes with superior accuracy.

Stratification and classification of IBD patients can be improved by layering ge- nomic data on top of clinical evidence. This thesis also describes the development of GenePy, a mathematical model for transforming patients genomic data into a per-individual per-gene deleteriousness scoring system. GenePy is capable of modelling and implementing important biological information from whole exome sequencing data from patient DNA. GenePy eases the analysis and interpretation of genomic data on an individual basis and concomitantly allows the comparison of genetic profiles across patients. GenePy gene scores can be further combined according to molecular processes or pathways.

This work describes eight novel immuno-genomic IBD sybtypes observed on a small cohort for which immune cytokine signalling and response cascades have been specifically profiled and GenePy scores obtained.

In addition, the GenePy algorithm is applied using both supervised and unsuper- vised approaches to classify IBD subtypes and to explore alternative disease clas- sifications that discriminate molecular clinical subtypes that are clinically relevant for treatment and prognosis. This thesis reports the current highest performance in discriminating IBD subtypes using exome sequencing data and five novel ge- nomic patient strata defined by different mutational burden of adaptive immune system genes.

This work demonstrates the power of integrating 21st century high throughput digital data in machine learning frameworks and the potential to obtain clinically
relevant strata for bench to bedside improvements in patient quality of life.
University of Southampton
Mossotto, Enrico
96f778af-f51a-464e-8e73-dfa24b4132de
Mossotto, Enrico
96f778af-f51a-464e-8e73-dfa24b4132de
Ennis, Sarah
7b57f188-9d91-4beb-b217-09856146f1e9
Macarthur, Benjamin
2c0476e7-5d3e-4064-81bb-104e8e88bb6b
Brodzki, Jacek
b1fe25fd-5451-4fd0-b24b-c59b75710543

Mossotto, Enrico (2018) Advanced modelling of genomic data in Inflammatory Bowel Disease. University of Southampton, Doctoral Thesis, 214pp.

Record type: Thesis (Doctoral)

Abstract

Advances in next generation sequencing technologies allow the collection of enormous volumes of genomic data on large patient cohorts. Concurrently, machine learning algorithms are rapidly evolving and, together, these technologies represent the new frontier of research and clinical management on a path leading toward personalised medicine.

The aims of this thesis are two. Firstly, to develop a mathematical framework for the analysis and integration of next generation sequencing data. Secondly, to model data from patients affected by inflammatory bowel disease (IBD), a common complex autoimmune condition with increasing incidence worldwide, by applying machine learning methodologies to clinical and transformed genomic data.

The analyses presented in this thesis are largely based on a cohort of paediatric IBD patients for which clinical data, immunology and whole exome sequencing data were available.

This research illustrates a supervised and unsupervised machine learning approach modelling histology and endoscopy data for assigning IBD patients with the correct CD/UC subtypes with superior accuracy.

Stratification and classification of IBD patients can be improved by layering ge- nomic data on top of clinical evidence. This thesis also describes the development of GenePy, a mathematical model for transforming patients genomic data into a per-individual per-gene deleteriousness scoring system. GenePy is capable of modelling and implementing important biological information from whole exome sequencing data from patient DNA. GenePy eases the analysis and interpretation of genomic data on an individual basis and concomitantly allows the comparison of genetic profiles across patients. GenePy gene scores can be further combined according to molecular processes or pathways.

This work describes eight novel immuno-genomic IBD sybtypes observed on a small cohort for which immune cytokine signalling and response cascades have been specifically profiled and GenePy scores obtained.

In addition, the GenePy algorithm is applied using both supervised and unsuper- vised approaches to classify IBD subtypes and to explore alternative disease clas- sifications that discriminate molecular clinical subtypes that are clinically relevant for treatment and prognosis. This thesis reports the current highest performance in discriminating IBD subtypes using exome sequencing data and five novel ge- nomic patient strata defined by different mutational burden of adaptive immune system genes.

This work demonstrates the power of integrating 21st century high throughput digital data in machine learning frameworks and the potential to obtain clinically
relevant strata for bench to bedside improvements in patient quality of life.

Text
Mossotto_e-thesis - Version of Record
Available under License University of Southampton Thesis Licence.
Download (48MB)

More information

Published date: May 2018

Identifiers

Local EPrints ID: 422261
URI: http://eprints.soton.ac.uk/id/eprint/422261
PURE UUID: 350d12f7-1877-4e70-af64-3319c7c77f54
ORCID for Enrico Mossotto: ORCID iD orcid.org/0000-0003-3996-3931
ORCID for Sarah Ennis: ORCID iD orcid.org/0000-0003-2648-0869
ORCID for Benjamin Macarthur: ORCID iD orcid.org/0000-0002-5396-9750
ORCID for Jacek Brodzki: ORCID iD orcid.org/0000-0002-4524-1081

Catalogue record

Date deposited: 20 Jul 2018 16:30
Last modified: 16 Mar 2024 03:24

Export record

Contributors

Author: Enrico Mossotto ORCID iD
Thesis advisor: Sarah Ennis ORCID iD
Thesis advisor: Benjamin Macarthur ORCID iD
Thesis advisor: Jacek Brodzki ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×