Development and application of methods for resolving molecular diagnoses from patient sequence data for monogenic diseases

Identifying molecular causes of disease from sequenced genomes can be extremely challenging, and usually requires tiered filtering with the possibility of causal variant(s) being missed. The first stage of this study was focused on understanding the specific properties and features of genes including essentiality, haploinsufficiency, and selection and therefore, linking these properties to facilitate the prediction of disease causal genes. Gene essentiality refers to genes that is required for the survival of the cells. This study found 20 gene-specific scores in the literature, each of which measures various genetic features. It then showed that until now, no reliable single score has been predictive of genic deleteriousness. This systematic review helped in identifying the gaps and challenges in the prediction of disease genes that might have an impact on the diagnosis of monogenic diseases. This information on genes rather than variants broadens the scope of thinking to better assess gene pathogenicity. The second stage gathered all this information to build a model to filter the clinical sequence data and decrease the number of potential disease-causing genes to follow-up. Further, essentiality specific pathogenicity prioritisation (ESPP) was constructed to prioritise disease causing genes and showed improved performance in identifying disease genes that score high—helping to exclude non-disease genes that score low—as compared to any single score. The third stage evaluated the proposed gene-level score to guide prioritization of disease genes by testing the score using multiple databases and integration with alternative scores. This contributes significantly to improving data interpretation. The results were encouraging as two genes, named CNOT1 and RYR3, that were prioritised by ESPP as strong candidates for Mendelian diseases, were subsequently confirmed to be causal. Another finding from the sum of ranks of alternative scores (ESPP, LOEUF and CoNeS) found four genes (SETD1A, SMARCC2, KDM3B, MED12L) that were ranked highly and are now known to contain disease variations. Ultimately, applying such models to monogenic disease patient sequence data will help identify molecular causes for these conditions.

University of Southampton

Alyousfi, Dareen Mohammed

d3304c17-f4a4-4928-a721-cf8886302c0e

30 June 2022

Alyousfi, Dareen Mohammed

d3304c17-f4a4-4928-a721-cf8886302c0e

Collins, Andrew

7daa83eb-0b21-43b2-af1a-e38fb36e2a64

Alyousfi, Dareen Mohammed (2022) Development and application of methods for resolving molecular diagnoses from patient sequence data for monogenic diseases. University of Southampton, Doctoral Thesis, 210pp.

Record type: Thesis (Doctoral)

Abstract

Text

Development and application of methods for resolving molecular diagnoses from patient sequence data for monogenic diseases - Version of Record

Available under License University of Southampton Thesis Licence.

Download (5MB)

Text

Permission to deposit thesis - form_TAN

Restricted to Repository staff only

More information

Submitted date: July 2021

Published date: 30 June 2022

Related URLs:

Identifiers

Local EPrints ID: 474722

URI: http://eprints.soton.ac.uk/id/eprint/474722

PURE UUID: c1bf715a-74ea-4436-a683-2416a6d78389

ORCID for Andrew Collins:

orcid.org/0000-0001-7108-0771

Catalogue record

Date deposited: 02 Mar 2023 17:30

Last modified: 17 Mar 2024 02:38

Export record

Share this record

Share this on Facebook Share this on Twitter Share this on Weibo

Contributors

Author: Dareen Mohammed Alyousfi

Thesis advisor: Andrew Collins

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Library staff additional information