The University of Southampton
University of Southampton Institutional Repository

Improving diagnosis of genetic disease through computational investigation of splicing

Improving diagnosis of genetic disease through computational investigation of splicing
Improving diagnosis of genetic disease through computational investigation of splicing
Despite an estimate of 50% of pathogenic genomic mutations being related to splicing, this inherently complex mechanism is not yet fully understood. Identifying splice disruption is a complicated expert task requiring manual labour and expensive sequencing. With the emergence of Machine Learning for targeted medicine, modelling splicing computationally allows faster and less expensive analysis and ultimately, treatment. This project curates, analyses, optimises, and utilises Machine Learning datasets and algorithms for splicing related disease using supervised and unsupervised techniques. A clinical dataset of splice disrupting variants is curated, processed, and validated to assess algorithmic predictive performance in clinically relevant data. Predictions are improved by data engineering to include isoforms with lower expressions. Other avenues such as including protein binding sites, incorporating genomic conservation, and semantic encoding of DNA data are explored. CI-SpliceAI, a new algorithm to predict aberrant splicing, is developed and made available to the wider scientific community. Methods of how to explain shallow and deep learning are applied in order to visualise feature contribution of otherwise black-box algorithms to extract new insights about the underlying biological problem.
University of Southampton
Strauch, Yaron Leander
a246b519-5a8a-4011-b330-7184abc055eb
Strauch, Yaron Leander
a246b519-5a8a-4011-b330-7184abc055eb
Baralle, Diana
faac16e5-7928-4801-9811-8b3a9ea4bb91

Strauch, Yaron Leander (2023) Improving diagnosis of genetic disease through computational investigation of splicing. University of Southampton, Doctoral Thesis, 193pp.

Record type: Thesis (Doctoral)

Abstract

Despite an estimate of 50% of pathogenic genomic mutations being related to splicing, this inherently complex mechanism is not yet fully understood. Identifying splice disruption is a complicated expert task requiring manual labour and expensive sequencing. With the emergence of Machine Learning for targeted medicine, modelling splicing computationally allows faster and less expensive analysis and ultimately, treatment. This project curates, analyses, optimises, and utilises Machine Learning datasets and algorithms for splicing related disease using supervised and unsupervised techniques. A clinical dataset of splice disrupting variants is curated, processed, and validated to assess algorithmic predictive performance in clinically relevant data. Predictions are improved by data engineering to include isoforms with lower expressions. Other avenues such as including protein binding sites, incorporating genomic conservation, and semantic encoding of DNA data are explored. CI-SpliceAI, a new algorithm to predict aberrant splicing, is developed and made available to the wider scientific community. Methods of how to explain shallow and deep learning are applied in order to visualise feature contribution of otherwise black-box algorithms to extract new insights about the underlying biological problem.

Text
Improving Diagnosis of Genetic Disease through Computational Investigation of Splicing
Download (10MB)
Text
Permission to deposit thesis - Yaron Strauch.db_TAN
Restricted to Repository staff only

More information

Submitted date: November 2022
Published date: January 2023

Identifiers

Local EPrints ID: 475951
URI: http://eprints.soton.ac.uk/id/eprint/475951
PURE UUID: 5078cebe-f004-4ed9-b928-ca60cb240a4b
ORCID for Yaron Leander Strauch: ORCID iD orcid.org/0000-0003-0820-8319
ORCID for Diana Baralle: ORCID iD orcid.org/0000-0003-3217-4833

Catalogue record

Date deposited: 31 Mar 2023 17:05
Last modified: 17 Mar 2024 03:13

Export record

Contributors

Author: Yaron Leander Strauch ORCID iD
Thesis advisor: Diana Baralle ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×