READ ME File For 'Dataset title' Dataset DOI: 10.5258/SOTON/D2910 ReadMe Author: Dr Eleanor Seaby, University of Southampton ORCID 0000-0002-6814-8648 This dataset supports the thesis entitled "Methods to identify novel disease genes and uplift diagnosis rates in rare diseases" AWARDED BY: Univeristy of Southampton DATE OF AWARD: 2023 DESCRIPTION OF THE DATA [This should include a detailed description of the data, how it was collected/created, any specialist software needed to view the data] This dataset contains: 1. Folder called 'Appendix papers' This contains 15 published articles in peer review journals or preprint archives which represent work from my thesis. Appendix Paper 1 | Strategies to Uplift Novel Mendelian Gene Discovery for Improved Clinical Outcomes Appendix Paper 2 | Challenges in the diagnosis and discovery of rare genetic disorders using contemporary sequencing technologies Appendix Paper 3 | The mutational constraint spectrum quantified from variation in 141,456 humans Appendix Paper 4 | Transcript expression-aware annotation improves rare variant interpretation Appendix Paper 5 | Addendum: The mutational constraint spectrum quantified from variation in 141,456 humans Appendix Paper 6 | Advanced variant classification framework reduces the false positive rate of predicted loss of function (pLoF) variants in population sequencing data Appendix Paper 7 | A gene-to-patient approach uplifts novel disease gene discovery and identifies 18 putative novel disease genes Appendix Paper 8 | Response to Ramos et al. Appendix Paper 9 | 100,000 Genomes Pilot on Rare-Disease Diagnosis in Health Care — Preliminary Report Appendix Paper 10 | Loss-of-function variants in TAF4 are associated with a neurodevelopmental disorder. Human Mutation Appendix Paper 11 | Monogenic de novo variants in DDX17 cause a novel neurodevelopmental disorder Appendix Paper 12 | Targeting de novo loss of function variants in constrained disease genes improves diagnostic rates in the 100,000 Genomes Project Appendix Paper 13 | A gene pathogenicity tool ‘GenePy’ identifies missed biallelic diagnoses in the 100,000 Genomes Project Appendix Paper 14 | A panel-agnostic strategy ‘HiPPo’ improves diagnostic efficiency in the UK 2 Genome Medicine Service Appendix Paper 15 | A novel variant in GATM causes idiopathic renal Fanconi syndrome and predicts progression to end-stage kidney disease 2. Folder called 'Supplementary Datasets' All data can be opened using Microsoft Excel. Supplementary Dataset SD1 | Enriched biological processes in DDX17 RNA-seq data [Co-author Cyril F. Bourgeois; University of Lyon] Supplementary Dataset SD2 | Curation of pLoF variants in haploinsufficient genes Supplementary Dataset SD3 | Curation of 3362 homozygous pLoF variants [Co-authors Moriel Singer-Berk, Eleina England; Broad Institute of MIT and Harvard] Supplementary Dataset SD4 | Detailed phenotype table of patients with DDX17 variants Supplementary Dataset SD5 | Differentially expressed genes in DDX17-KD cells compared to control cells [Co-author Cyril F. Bourgeois; University of Lyon] Supplementary Dataset SD6 | Detailed phenotype table of patients with HDLBP variants Supplementary Dataset SD7 | Manual curation of 45 remaining variants [Co-author N. Simon Thomas, University of Southampton] Supplementary Dataset SD8 | Re-analysis of DeNovoLOEUF on 100,000 Genomes Project data Supplementary Dataset SD9 | 36 possible missed diagnoses in patients with a cardiomyopathy phenotype Supplementary Dataset SD10 | Genes associated with cardiomyopathies Supplementary Dataset SD11 | Autosomal recessive disease genes Supplementary Dataset SD12 | 682 participants with a potential missed diagnosis Supplementary Dataset SD13 | Variants identified using the HiPPo protocol 3. Folder called 'Supplementary Tables' All data can be opened using Microsoft Excel. Supplementary Table S1 | Environmental tools in GEL Supplementary Table S2 | List of 1,815 genes tolerant of homozygous loss-of-function variation [Co-author Moriel Singer-Berk; Broad Institute of MIT and Harvard] Supplementary Table S3 | Genes tolerant of homozygous loss-of-function variation with an OMIM dominant association Supplementary Table S4 | 27 genes with more than one Genomics England kindred affected Supplementary Table S5 | 99 Class 2 and Class 3 genes Supplementary Table S6 | Sequences of siRNAs against DDX17 [Co-author Cyril F. Bourgeois; University of Lyon] Supplementary Table S7 | A summary of high-level phenotypes of the 100,000 Genomes Project patient population Supplementary Table S8 | All human genes curated with a LOEUF score Supplementary Table S9 | 182 participants without a listed cardiomyopathy phenotype that had a pathogenic variant returned by 100KGP in a cardiomyopathy-related gene Supplementary Table S10 | Quality control of 24 samples from 8 families undergoing parallel research exome and clinical genome [Co-author Nichola Grahame; University of Southampton] 4. Folder called 'Supplementary Figures' Contains a single word document will the following figures: Supplementary Figure S1 | Crispr/Cas9 microinjection into X. tropicalis eggs produces mosaic homozygous crispant tadpoles encoding truncated Ddx17 which is inherited in the F1 generation [Co-authors Annie Godwin, Matt Guille; University of Portsmouth] Supplementary Figure S2 | The amino acid alignment between the H. sapiens and X. tropicalis Ddx17 proteins [Co-authors Annie Godwin, Matt Guille; University of Portsmouth] Supplementary Figure S3 | F0 mosaic homozygous X. tropicalis display reduced axon outgrowth, and working memory like F1 models, but also gastrulation defects and short term microcephaly [Co-authors Annie Godwin, Matt Guille] Supplementary Figure S4 | Results of dark-light transitions assay and neuronal outgrowth [Co-authors Annie Godwin, Matt Guille; University of Portsmouth] Supplementary Figure S5 | Compound heterozygous ddx17-/- tadpoles are morphologically normal but show working memory deficits [Co-authors Annie Godwin, Matt Guille; University of Portsmouth] Supplementary Figure S6 | Network representation of the top 40 enriched biological processes [Co-author Cyril F. Bourgeois; University of Lyon] Supplementary Figure S7 | Enriched biological processes for down-regulated and up-regulated genes [Co-author Cyril F. Bourgeois; University of Lyon] Date of data collection: Data were collected between October 2019 and August 2023. Information about geographic location of data collection: Unless an exception is listed below, all data were collected in the UK. Data collected in France: Supplementary Dataset SD1 Supplementary Dataset SD5 Supplementary Table S6 Supplementary Figure S6 Supplementary Figure S7 Data collected in the USA: Supplementary Dataset SD2 Supplementary Dataset SD3 Supplementary Table S2 Licence: [ADD IN] Related projects/Funders: I, Eleanor Seaby, was generously supported by the Gerald Kerkut Charitable Trust, Foulkes Foundation, and a University of Southampton Presidential Scholarship. Work in collaboration with Matt Guille and Annie Godwin was supported by: MRC award MR/V012177/1 Work in collaboration with The Broad Institute was supported by grants: National Human Genome Research Institute (NHGRI) grant U01HG011755 as part of the GREGoR consortium and NHGRI R01HG009141. Related publication: All publications related to the work in my thesis are listed above in: 1. Folder called 'Appendix papers' Date that the file was created: December, 2023