Development and application of methods for resolving molecular diagnoses from patient sequence data for monogenic diseases
Development and application of methods for resolving molecular diagnoses from patient sequence data for monogenic diseases
Identifying molecular causes of disease from sequenced genomes can be extremely challenging, and usually requires tiered filtering with the possibility of causal variant(s) being missed. The first stage of this study was focused on understanding the specific properties and features of genes including essentiality, haploinsufficiency, and selection and therefore, linking these properties to facilitate the prediction of disease causal genes. Gene essentiality refers to genes that is required for the survival of the cells. This study found 20 gene-specific scores in the literature, each of which measures various genetic features. It then showed that until now, no reliable single score has been predictive of genic deleteriousness. This systematic review helped in identifying the gaps and challenges in the prediction of disease genes that might have an impact on the diagnosis of monogenic diseases. This information on genes rather than variants broadens the scope of thinking to better assess gene pathogenicity. The second stage gathered all this information to build a model to filter the clinical sequence data and decrease the number of potential disease-causing genes to follow-up. Further, essentiality specific pathogenicity prioritisation (ESPP) was constructed to prioritise disease causing genes and showed improved performance in identifying disease genes that score high—helping to exclude non-disease genes that score low—as compared to any single score. The third stage evaluated the proposed gene-level score to guide prioritization of disease genes by testing the score using multiple databases and integration with alternative scores. This contributes significantly to improving data interpretation. The results were encouraging as two genes, named CNOT1 and RYR3, that were prioritised by ESPP as strong candidates for Mendelian diseases, were subsequently confirmed to be causal. Another finding from the sum of ranks of alternative scores (ESPP, LOEUF and CoNeS) found four genes (SETD1A, SMARCC2, KDM3B, MED12L) that were ranked highly and are now known to contain disease variations. Ultimately, applying such models to monogenic disease patient sequence data will help identify molecular causes for these conditions.
University of Southampton
Alyousfi, Dareen Mohammed
d3304c17-f4a4-4928-a721-cf8886302c0e
30 June 2022
Alyousfi, Dareen Mohammed
d3304c17-f4a4-4928-a721-cf8886302c0e
Collins, Andrew
7daa83eb-0b21-43b2-af1a-e38fb36e2a64
Alyousfi, Dareen Mohammed
(2022)
Development and application of methods for resolving molecular diagnoses from patient sequence data for monogenic diseases.
University of Southampton, Doctoral Thesis, 210pp.
Record type:
Thesis
(Doctoral)
Abstract
Identifying molecular causes of disease from sequenced genomes can be extremely challenging, and usually requires tiered filtering with the possibility of causal variant(s) being missed. The first stage of this study was focused on understanding the specific properties and features of genes including essentiality, haploinsufficiency, and selection and therefore, linking these properties to facilitate the prediction of disease causal genes. Gene essentiality refers to genes that is required for the survival of the cells. This study found 20 gene-specific scores in the literature, each of which measures various genetic features. It then showed that until now, no reliable single score has been predictive of genic deleteriousness. This systematic review helped in identifying the gaps and challenges in the prediction of disease genes that might have an impact on the diagnosis of monogenic diseases. This information on genes rather than variants broadens the scope of thinking to better assess gene pathogenicity. The second stage gathered all this information to build a model to filter the clinical sequence data and decrease the number of potential disease-causing genes to follow-up. Further, essentiality specific pathogenicity prioritisation (ESPP) was constructed to prioritise disease causing genes and showed improved performance in identifying disease genes that score high—helping to exclude non-disease genes that score low—as compared to any single score. The third stage evaluated the proposed gene-level score to guide prioritization of disease genes by testing the score using multiple databases and integration with alternative scores. This contributes significantly to improving data interpretation. The results were encouraging as two genes, named CNOT1 and RYR3, that were prioritised by ESPP as strong candidates for Mendelian diseases, were subsequently confirmed to be causal. Another finding from the sum of ranks of alternative scores (ESPP, LOEUF and CoNeS) found four genes (SETD1A, SMARCC2, KDM3B, MED12L) that were ranked highly and are now known to contain disease variations. Ultimately, applying such models to monogenic disease patient sequence data will help identify molecular causes for these conditions.
Text
Development and application of methods for resolving molecular diagnoses from patient sequence data for monogenic diseases
- Version of Record
Text
Permission to deposit thesis - form_TAN
Restricted to Repository staff only
More information
Submitted date: July 2021
Published date: 30 June 2022
Identifiers
Local EPrints ID: 474722
URI: http://eprints.soton.ac.uk/id/eprint/474722
PURE UUID: c1bf715a-74ea-4436-a683-2416a6d78389
Catalogue record
Date deposited: 02 Mar 2023 17:30
Last modified: 17 Mar 2024 02:38
Export record
Contributors
Author:
Dareen Mohammed Alyousfi
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics