The University of Southampton
University of Southampton Institutional Repository

Identification of disease-associated loci using machine learning for genotype and network data integration

Identification of disease-associated loci using machine learning for genotype and network data integration
Identification of disease-associated loci using machine learning for genotype and network data integration
Motivation: Integration of different omics data could markedly help to identify biological signatures, understand the missing heritability of complex diseases and ultimately achieve personalized medicine. Standard regression models used in Genome-Wide Association Studies (GWAS) identify loci with a strong effect size, whereas GWAS meta-analyses are often needed to capture weak loci contributing to the missing heritability. Development of novel machine learning algorithms for merging genotype data with other omics data is highly needed as it could enhance the prioritization of weak loci.
Results: We developed cNMTF (corrected non-negative matrix tri-factorization), an integrative algorithm based on clustering techniques of biological data. This method assesses the inter-relatedness between genotypes, phenotypes, the damaging effect of the variants and gene networks in order to identify loci-trait associations. cNMTF was used to prioritize genes associated with lipid traits in two population cohorts. We replicated 129 genes reported in GWAS world-wide and provided evidence that supports 85% of our findings (226 out of 265 genes), including recent associations in literature (NLGN1), regulators of lipid metabolism (DAB1) and pleiotropic genes for lipid traits (CARM1). Moreover, cNMTF performed efficiently against strong population structures by accounting for the individuals’ ancestry. As the method is flexible in the incorporation of diverse omics data sources, it can be easily adapted to the user’s research needs.
1367-4803
5182–5190
Leal, Luis G.
e4601adf-c473-44ea-bb6e-0b0fa1c2febb
David, Alessia
11e7657e-216c-4e2e-ba3e-25153bde7f04
Jarvelin, Marjo-Riita
beb8f654-f97a-4d06-9cef-2db4078fe510
Sebert, Sylvain
d3f6c0f7-92b9-4959-a86c-9f7ad3346ee3
Männikkö, Minna
edc00910-48d1-421a-8176-f5a297ca7057
Karhunen, Ville
c02eac41-ebc5-475b-b0bb-3022697b856a
Seaby, Eleanor
ec948f42-007c-4bd8-9dff-bb86278bf03f
Hoggart, Clive
63c440f0-10bf-4444-987f-9ac8029d857a
Sternberg, Michael J. E.
12eb5626-db7b-43f1-ab88-9f7a671869ec
Leal, Luis G.
e4601adf-c473-44ea-bb6e-0b0fa1c2febb
David, Alessia
11e7657e-216c-4e2e-ba3e-25153bde7f04
Jarvelin, Marjo-Riita
beb8f654-f97a-4d06-9cef-2db4078fe510
Sebert, Sylvain
d3f6c0f7-92b9-4959-a86c-9f7ad3346ee3
Männikkö, Minna
edc00910-48d1-421a-8176-f5a297ca7057
Karhunen, Ville
c02eac41-ebc5-475b-b0bb-3022697b856a
Seaby, Eleanor
ec948f42-007c-4bd8-9dff-bb86278bf03f
Hoggart, Clive
63c440f0-10bf-4444-987f-9ac8029d857a
Sternberg, Michael J. E.
12eb5626-db7b-43f1-ab88-9f7a671869ec

Leal, Luis G., David, Alessia, Jarvelin, Marjo-Riita, Sebert, Sylvain, Männikkö, Minna, Karhunen, Ville, Seaby, Eleanor, Hoggart, Clive and Sternberg, Michael J. E. (2019) Identification of disease-associated loci using machine learning for genotype and network data integration. Bioinformatics, 35 (24), 5182–5190. (doi:10.1093/bioinformatics/btz310).

Record type: Article

Abstract

Motivation: Integration of different omics data could markedly help to identify biological signatures, understand the missing heritability of complex diseases and ultimately achieve personalized medicine. Standard regression models used in Genome-Wide Association Studies (GWAS) identify loci with a strong effect size, whereas GWAS meta-analyses are often needed to capture weak loci contributing to the missing heritability. Development of novel machine learning algorithms for merging genotype data with other omics data is highly needed as it could enhance the prioritization of weak loci.
Results: We developed cNMTF (corrected non-negative matrix tri-factorization), an integrative algorithm based on clustering techniques of biological data. This method assesses the inter-relatedness between genotypes, phenotypes, the damaging effect of the variants and gene networks in order to identify loci-trait associations. cNMTF was used to prioritize genes associated with lipid traits in two population cohorts. We replicated 129 genes reported in GWAS world-wide and provided evidence that supports 85% of our findings (226 out of 265 genes), including recent associations in literature (NLGN1), regulators of lipid metabolism (DAB1) and pleiotropic genes for lipid traits (CARM1). Moreover, cNMTF performed efficiently against strong population structures by accounting for the individuals’ ancestry. As the method is flexible in the incorporation of diverse omics data sources, it can be easily adapted to the user’s research needs.

This record has no associated files available for download.

More information

Accepted/In Press date: 25 April 2019
Published date: 15 December 2019

Identifiers

Local EPrints ID: 469980
URI: http://eprints.soton.ac.uk/id/eprint/469980
ISSN: 1367-4803
PURE UUID: ac51f656-dd1d-48e0-902e-7d2caf57f490
ORCID for Eleanor Seaby: ORCID iD orcid.org/0000-0002-6814-8648

Catalogue record

Date deposited: 29 Sep 2022 16:45
Last modified: 17 Mar 2024 04:05

Export record

Altmetrics

Contributors

Author: Luis G. Leal
Author: Alessia David
Author: Marjo-Riita Jarvelin
Author: Sylvain Sebert
Author: Minna Männikkö
Author: Ville Karhunen
Author: Eleanor Seaby ORCID iD
Author: Clive Hoggart
Author: Michael J. E. Sternberg

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×