Identification of disease-associated loci using machine learning for genotype and network data integration
Identification of disease-associated loci using machine learning for genotype and network data integration
Motivation: Integration of different omics data could markedly help to identify biological signatures, understand the missing heritability of complex diseases and ultimately achieve personalized medicine. Standard regression models used in Genome-Wide Association Studies (GWAS) identify loci with a strong effect size, whereas GWAS meta-analyses are often needed to capture weak loci contributing to the missing heritability. Development of novel machine learning algorithms for merging genotype data with other omics data is highly needed as it could enhance the prioritization of weak loci.
Results: We developed cNMTF (corrected non-negative matrix tri-factorization), an integrative algorithm based on clustering techniques of biological data. This method assesses the inter-relatedness between genotypes, phenotypes, the damaging effect of the variants and gene networks in order to identify loci-trait associations. cNMTF was used to prioritize genes associated with lipid traits in two population cohorts. We replicated 129 genes reported in GWAS world-wide and provided evidence that supports 85% of our findings (226 out of 265 genes), including recent associations in literature (NLGN1), regulators of lipid metabolism (DAB1) and pleiotropic genes for lipid traits (CARM1). Moreover, cNMTF performed efficiently against strong population structures by accounting for the individuals’ ancestry. As the method is flexible in the incorporation of diverse omics data sources, it can be easily adapted to the user’s research needs.
5182–5190
Leal, Luis G.
e4601adf-c473-44ea-bb6e-0b0fa1c2febb
David, Alessia
11e7657e-216c-4e2e-ba3e-25153bde7f04
Jarvelin, Marjo-Riita
beb8f654-f97a-4d06-9cef-2db4078fe510
Sebert, Sylvain
d3f6c0f7-92b9-4959-a86c-9f7ad3346ee3
Männikkö, Minna
edc00910-48d1-421a-8176-f5a297ca7057
Karhunen, Ville
c02eac41-ebc5-475b-b0bb-3022697b856a
Seaby, Eleanor
ec948f42-007c-4bd8-9dff-bb86278bf03f
Hoggart, Clive
63c440f0-10bf-4444-987f-9ac8029d857a
Sternberg, Michael J. E.
12eb5626-db7b-43f1-ab88-9f7a671869ec
15 December 2019
Leal, Luis G.
e4601adf-c473-44ea-bb6e-0b0fa1c2febb
David, Alessia
11e7657e-216c-4e2e-ba3e-25153bde7f04
Jarvelin, Marjo-Riita
beb8f654-f97a-4d06-9cef-2db4078fe510
Sebert, Sylvain
d3f6c0f7-92b9-4959-a86c-9f7ad3346ee3
Männikkö, Minna
edc00910-48d1-421a-8176-f5a297ca7057
Karhunen, Ville
c02eac41-ebc5-475b-b0bb-3022697b856a
Seaby, Eleanor
ec948f42-007c-4bd8-9dff-bb86278bf03f
Hoggart, Clive
63c440f0-10bf-4444-987f-9ac8029d857a
Sternberg, Michael J. E.
12eb5626-db7b-43f1-ab88-9f7a671869ec
Leal, Luis G., David, Alessia, Jarvelin, Marjo-Riita, Sebert, Sylvain, Männikkö, Minna, Karhunen, Ville, Seaby, Eleanor, Hoggart, Clive and Sternberg, Michael J. E.
(2019)
Identification of disease-associated loci using machine learning for genotype and network data integration.
Bioinformatics, 35 (24), .
(doi:10.1093/bioinformatics/btz310).
Abstract
Motivation: Integration of different omics data could markedly help to identify biological signatures, understand the missing heritability of complex diseases and ultimately achieve personalized medicine. Standard regression models used in Genome-Wide Association Studies (GWAS) identify loci with a strong effect size, whereas GWAS meta-analyses are often needed to capture weak loci contributing to the missing heritability. Development of novel machine learning algorithms for merging genotype data with other omics data is highly needed as it could enhance the prioritization of weak loci.
Results: We developed cNMTF (corrected non-negative matrix tri-factorization), an integrative algorithm based on clustering techniques of biological data. This method assesses the inter-relatedness between genotypes, phenotypes, the damaging effect of the variants and gene networks in order to identify loci-trait associations. cNMTF was used to prioritize genes associated with lipid traits in two population cohorts. We replicated 129 genes reported in GWAS world-wide and provided evidence that supports 85% of our findings (226 out of 265 genes), including recent associations in literature (NLGN1), regulators of lipid metabolism (DAB1) and pleiotropic genes for lipid traits (CARM1). Moreover, cNMTF performed efficiently against strong population structures by accounting for the individuals’ ancestry. As the method is flexible in the incorporation of diverse omics data sources, it can be easily adapted to the user’s research needs.
This record has no associated files available for download.
More information
Accepted/In Press date: 25 April 2019
Published date: 15 December 2019
Identifiers
Local EPrints ID: 469980
URI: http://eprints.soton.ac.uk/id/eprint/469980
ISSN: 1367-4803
PURE UUID: ac51f656-dd1d-48e0-902e-7d2caf57f490
Catalogue record
Date deposited: 29 Sep 2022 16:45
Last modified: 17 Mar 2024 04:05
Export record
Altmetrics
Contributors
Author:
Luis G. Leal
Author:
Alessia David
Author:
Marjo-Riita Jarvelin
Author:
Sylvain Sebert
Author:
Minna Männikkö
Author:
Ville Karhunen
Author:
Eleanor Seaby
Author:
Clive Hoggart
Author:
Michael J. E. Sternberg
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics