Adjustment for gene expression PCA scores may induce reproducible false positive associations in eQTL analysis partly due to endogenous selection bias
Adjustment for gene expression PCA scores may induce reproducible false positive associations in eQTL analysis partly due to endogenous selection bias
Motivation: Expression quantitative trait loci (eQTL) analysis has been widely applied to map cis and trans regulatory elements of gene expression. Gene expression analysis is plagued with observed and latent biases such as batch effects and cell composition.
Principal component analysis (PCA) has the potenti al to capture these latent variables associated with global gene expression levels.
Adjustment of the eQTL model for principal components (PC) of the gene expression increases the yield of statistically significant eQTLs and consequently has been widely adopted in the field. The explanation that accompanies the large increase in reproducible eQTLs is that adjustment for PCs reduce variation induced by technical and biological latent factors affecting global gene expression.
Results: We here report that such practice may induce reproducible false positive results partly due to endogenous selection bias. That is to say, adjusting for PCs may open a path between a SNP not associated with expression but correlated with a PC that in turn is associated with the expression of a gene. Our simulation shows that false positive results induced by PCA adjustment can be reproducible. Real dataset analysis suggests that regression models with a SNP not associated to expression but correlated to a PC may result in both variables mutually acting as suppressor variables thereby inflating SNP effect sizes. Similarly, adjustment for multiple PCs increases R2 of the regression model and thereby reduces the standard errors of the beta coefficients. These two effects taken together increase significance of p-values and may induce false positives. We propose a simple procedure to detect whether SNPs and PCs are acting as suppressor variables.
Conclusions: We recommend a few techniques to deal with false positive associations in eQTLs and gene expression associations that arise from adjusting models for gene expression PCs
106-106
Couto Alves, Alexessander
87b9179e-abde-4ca5-abfc-4b7c5ac8b03b
26 April 2015
Couto Alves, Alexessander
87b9179e-abde-4ca5-abfc-4b7c5ac8b03b
Couto Alves, Alexessander
(2015)
Adjustment for gene expression PCA scores may induce reproducible false positive associations in eQTL analysis partly due to endogenous selection bias.
Cordell, Heather
(ed.)
In HUMAN HEREDITY: 44th European Mathematical Genetics Meeting (EMGM) 2016. Newcastle upon Tyne, UK, May 11-12, 2016: Abstracts.
vol. 80,
Karger.
.
(doi:10.1159/000445228).
Record type:
Conference or Workshop Item
(Paper)
Abstract
Motivation: Expression quantitative trait loci (eQTL) analysis has been widely applied to map cis and trans regulatory elements of gene expression. Gene expression analysis is plagued with observed and latent biases such as batch effects and cell composition.
Principal component analysis (PCA) has the potenti al to capture these latent variables associated with global gene expression levels.
Adjustment of the eQTL model for principal components (PC) of the gene expression increases the yield of statistically significant eQTLs and consequently has been widely adopted in the field. The explanation that accompanies the large increase in reproducible eQTLs is that adjustment for PCs reduce variation induced by technical and biological latent factors affecting global gene expression.
Results: We here report that such practice may induce reproducible false positive results partly due to endogenous selection bias. That is to say, adjusting for PCs may open a path between a SNP not associated with expression but correlated with a PC that in turn is associated with the expression of a gene. Our simulation shows that false positive results induced by PCA adjustment can be reproducible. Real dataset analysis suggests that regression models with a SNP not associated to expression but correlated to a PC may result in both variables mutually acting as suppressor variables thereby inflating SNP effect sizes. Similarly, adjustment for multiple PCs increases R2 of the regression model and thereby reduces the standard errors of the beta coefficients. These two effects taken together increase significance of p-values and may induce false positives. We propose a simple procedure to detect whether SNPs and PCs are acting as suppressor variables.
Conclusions: We recommend a few techniques to deal with false positive associations in eQTLs and gene expression associations that arise from adjusting models for gene expression PCs
More information
Accepted/In Press date: 26 April 2015
Published date: 26 April 2015
Identifiers
Local EPrints ID: 509575
URI: http://eprints.soton.ac.uk/id/eprint/509575
PURE UUID: d0a91dde-f96b-4189-b995-890131131fdc
Catalogue record
Date deposited: 25 Feb 2026 17:57
Last modified: 26 Feb 2026 03:12
Export record
Altmetrics
Contributors
Author:
Alexessander Couto Alves
Editor:
Heather Cordell
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics