The University of Southampton
University of Southampton Institutional Repository

GenePy - a score for estimating gene pathogenicity in individuals using next-generation sequencing data

GenePy - a score for estimating gene pathogenicity in individuals using next-generation sequencing data
GenePy - a score for estimating gene pathogenicity in individuals using next-generation sequencing data
BackgroundNext-generation sequencing is revolutionising diagnosis and treatment of rare diseases, however its application to understanding common disease aetiology is limited. Rare disease applications binarily attribute genetic change(s) at a single locus to a specific phenotype. In common diseases, where multiple genetic variants within and across genes contribute to disease, binary modelling cannot capture the burden of pathogenicity harboured by an individual across a given gene/pathway.We present GenePy, a novel gene-level scoring system for integration and analysis of next-generation sequencing data on a per-individual basis that transforms NGS data interpretation from variant-level to gene-level. This simple and flexible scoring system is intuitive and amenable to integration for machine learning, network and topological approaches, facilitating the investigation of complex phenotypes.ResultsWhole-exome sequencing data from 508 individuals were used to generate GenePy scores. For each variant a score is calculated incorporating: i) population allele frequency estimates; ii) individual zygosity, determined through standard variant calling pipelines and; iii) any user defined deleteriousness metric to inform on functional impact. GenePy then combines scores generated for all variants observed into a single gene score for each individual.We generated a matrix of ~ 14,000 GenePy scores for all individuals for each of sixteen popular deleteriousness metrics. All per-gene scores are corrected for gene length. The majority of genes generate GenePy scores < 0.01 although individuals harbouring multiple rare highly deleterious mutations can accumulate extremely high GenePy scores.In the absence of a comparator metric, we examine GenePy performance in discriminating genes known to be associated with three common, complex diseases. A Mann-Whitney U test conducted on GenePy scores for this positive control gene in cases versus controls demonstrates markedly more significant results (p = 1.37 × 10− 4) compared to the most commonly applied association tool that combines common and rare variation (p = 0.003).ConclusionsPer-gene per-individual GenePy scores are intuitive when assessing genetic variation in individual patients or comparing scores between groups. GenePy outperforms the currently accepted best practice tools for combining common and rare variation. GenePy scores are suitable for downstream data integration with transcriptomic and proteomic data that also report at the gene level.
1471-2105
1-15
Mossotto, Enrico
a2a572db-3e95-41c6-94f6-f1b019594372
Ashton, James
03369017-99b5-40ae-9a43-14c98516f37d
O'Gorman, Luke
6127468d-0693-4a05-b2d0-2f1c2ddc84ff
Pengelly, Reuben
af97c0c1-b568-415c-9f59-1823b65be76d
Beattie, R. Mark
9a66af0b-f81c-485c-b01d-519403f0038a
Macarthur, Benjamin
2c0476e7-5d3e-4064-81bb-104e8e88bb6b
Ennis, Sarah
7b57f188-9d91-4beb-b217-09856146f1e9
Mossotto, Enrico
a2a572db-3e95-41c6-94f6-f1b019594372
Ashton, James
03369017-99b5-40ae-9a43-14c98516f37d
O'Gorman, Luke
6127468d-0693-4a05-b2d0-2f1c2ddc84ff
Pengelly, Reuben
af97c0c1-b568-415c-9f59-1823b65be76d
Beattie, R. Mark
9a66af0b-f81c-485c-b01d-519403f0038a
Macarthur, Benjamin
2c0476e7-5d3e-4064-81bb-104e8e88bb6b
Ennis, Sarah
7b57f188-9d91-4beb-b217-09856146f1e9

Mossotto, Enrico, Ashton, James, O'Gorman, Luke, Pengelly, Reuben, Beattie, R. Mark, Macarthur, Benjamin and Ennis, Sarah (2019) GenePy - a score for estimating gene pathogenicity in individuals using next-generation sequencing data. BMC Bioinformatics, 20 (1), 1-15. (doi:10.1186/s12859-019-2877-3).

Record type: Article

Abstract

BackgroundNext-generation sequencing is revolutionising diagnosis and treatment of rare diseases, however its application to understanding common disease aetiology is limited. Rare disease applications binarily attribute genetic change(s) at a single locus to a specific phenotype. In common diseases, where multiple genetic variants within and across genes contribute to disease, binary modelling cannot capture the burden of pathogenicity harboured by an individual across a given gene/pathway.We present GenePy, a novel gene-level scoring system for integration and analysis of next-generation sequencing data on a per-individual basis that transforms NGS data interpretation from variant-level to gene-level. This simple and flexible scoring system is intuitive and amenable to integration for machine learning, network and topological approaches, facilitating the investigation of complex phenotypes.ResultsWhole-exome sequencing data from 508 individuals were used to generate GenePy scores. For each variant a score is calculated incorporating: i) population allele frequency estimates; ii) individual zygosity, determined through standard variant calling pipelines and; iii) any user defined deleteriousness metric to inform on functional impact. GenePy then combines scores generated for all variants observed into a single gene score for each individual.We generated a matrix of ~ 14,000 GenePy scores for all individuals for each of sixteen popular deleteriousness metrics. All per-gene scores are corrected for gene length. The majority of genes generate GenePy scores < 0.01 although individuals harbouring multiple rare highly deleterious mutations can accumulate extremely high GenePy scores.In the absence of a comparator metric, we examine GenePy performance in discriminating genes known to be associated with three common, complex diseases. A Mann-Whitney U test conducted on GenePy scores for this positive control gene in cases versus controls demonstrates markedly more significant results (p = 1.37 × 10− 4) compared to the most commonly applied association tool that combines common and rare variation (p = 0.003).ConclusionsPer-gene per-individual GenePy scores are intuitive when assessing genetic variation in individual patients or comparing scores between groups. GenePy outperforms the currently accepted best practice tools for combining common and rare variation. GenePy scores are suitable for downstream data integration with transcriptomic and proteomic data that also report at the gene level.

Text
GenePy main BMC Bioinfo 2904 - Accepted Manuscript
Download (158kB)
Text
Figure 1 - Accepted Manuscript
Download (2MB)
Text
Figure 2 - Accepted Manuscript
Download (117kB)
Text
Figure 3 - Accepted Manuscript
Download (54kB)
Text
s12859-019-2877-3 - Version of Record
Available under License Creative Commons Attribution.
Download (1MB)

Show all 5 downloads.

More information

Accepted/In Press date: 1 May 2019
Published date: 16 May 2019

Identifiers

Local EPrints ID: 430622
URI: https://eprints.soton.ac.uk/id/eprint/430622
ISSN: 1471-2105
PURE UUID: 94dbe55d-8d9b-4378-bd3d-6b7a685d68d0
ORCID for Reuben Pengelly: ORCID iD orcid.org/0000-0001-7022-645X
ORCID for Sarah Ennis: ORCID iD orcid.org/0000-0003-2648-0869

Catalogue record

Date deposited: 07 May 2019 16:30
Last modified: 20 Jul 2019 04:01

Export record

Altmetrics

Contributors

Author: Enrico Mossotto
Author: James Ashton
Author: Luke O'Gorman
Author: Reuben Pengelly ORCID iD
Author: R. Mark Beattie
Author: Sarah Ennis ORCID iD

University divisions

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of https://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×