Improving diagnosis of genetic disease through computational investigation of splicing
Improving diagnosis of genetic disease through computational investigation of splicing
Despite an estimate of 50% of pathogenic genomic mutations being related to splicing, this inherently complex mechanism is not yet fully understood. Identifying splice disruption is a complicated expert task requiring manual labour and expensive sequencing. With the emergence of Machine Learning for targeted medicine, modelling splicing computationally allows faster and less expensive analysis and ultimately, treatment. This project curates, analyses, optimises, and utilises Machine Learning datasets and algorithms for splicing related disease using supervised and unsupervised techniques. A clinical dataset of splice disrupting variants is curated, processed, and validated to assess algorithmic predictive performance in clinically relevant data. Predictions are improved by data engineering to include isoforms with lower expressions. Other avenues such as including protein binding sites, incorporating genomic conservation, and semantic encoding of DNA data are explored. CI-SpliceAI, a new algorithm to predict aberrant splicing, is developed and made available to the wider scientific community. Methods of how to explain shallow and deep learning are applied in order to visualise feature contribution of otherwise black-box algorithms to extract new insights about the underlying biological problem.
University of Southampton
Strauch, Yaron Leander
a246b519-5a8a-4011-b330-7184abc055eb
January 2023
Strauch, Yaron Leander
a246b519-5a8a-4011-b330-7184abc055eb
Baralle, Diana
faac16e5-7928-4801-9811-8b3a9ea4bb91
Strauch, Yaron Leander
(2023)
Improving diagnosis of genetic disease through computational investigation of splicing.
University of Southampton, Doctoral Thesis, 193pp.
Record type:
Thesis
(Doctoral)
Abstract
Despite an estimate of 50% of pathogenic genomic mutations being related to splicing, this inherently complex mechanism is not yet fully understood. Identifying splice disruption is a complicated expert task requiring manual labour and expensive sequencing. With the emergence of Machine Learning for targeted medicine, modelling splicing computationally allows faster and less expensive analysis and ultimately, treatment. This project curates, analyses, optimises, and utilises Machine Learning datasets and algorithms for splicing related disease using supervised and unsupervised techniques. A clinical dataset of splice disrupting variants is curated, processed, and validated to assess algorithmic predictive performance in clinically relevant data. Predictions are improved by data engineering to include isoforms with lower expressions. Other avenues such as including protein binding sites, incorporating genomic conservation, and semantic encoding of DNA data are explored. CI-SpliceAI, a new algorithm to predict aberrant splicing, is developed and made available to the wider scientific community. Methods of how to explain shallow and deep learning are applied in order to visualise feature contribution of otherwise black-box algorithms to extract new insights about the underlying biological problem.
Text
Improving Diagnosis of Genetic Disease through Computational Investigation of Splicing
Text
Permission to deposit thesis - Yaron Strauch.db_TAN
Restricted to Repository staff only
More information
Submitted date: November 2022
Published date: January 2023
Identifiers
Local EPrints ID: 475951
URI: http://eprints.soton.ac.uk/id/eprint/475951
PURE UUID: 5078cebe-f004-4ed9-b928-ca60cb240a4b
Catalogue record
Date deposited: 31 Mar 2023 17:05
Last modified: 13 Aug 2024 01:41
Export record
Contributors
Author:
Yaron Leander Strauch
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics