The University of Southampton
University of Southampton Institutional Repository

Outlier detection at the transcriptome-proteome interface

Outlier detection at the transcriptome-proteome interface
Outlier detection at the transcriptome-proteome interface
BACKGROUND:

In high-throughput experimental biology, it is widely acknowledged that while expression levels measured at the levels of transcriptome and the corresponding proteome do not, in general, correlate well, messenger RNA levels are used as convenient proxies for protein levels. Our interest is in developing data-driven computational models that can bridge the gap between these two levels of measurement at which different mechanisms of regulation may act on different molecular species causing any observed lack of correlations. To this end, we build data-driven predictors of protein levels using mRNA levels and known proxies of translation efficiencies as covariates. Previous work showed that in such a setting, outliers with respect to the model are reliable candidates for post-translational regulation.
RESULTS:

Here, we present and compare two novel formulations of deriving a protein concentration predictor from which outliers may be extracted in a systematic manner. The first approach, outlier rejecting regression, allows explicit specification of a certain fraction of the data as outliers. In a regression setting, this is a non-convex optimization problem which we solve by deriving a difference of convex functions algorithm (DCA). With post-translationally regulated proteins, one expects their concentrations to be affected primarily by disruption of protein stability. Our second algorithm exploits this observation by minimizing an asymmetric loss using quantile regression and extracts outlier proteins whose measured concentrations are lower than what a genome-wide regression would predict. We validate the two approaches on a dataset of yeast transcriptome and proteome. Functional annotation check on detected outliers demonstrate that the methods are able to identify post-translationally regulated genes with high statistical confidence.
1367-4803
Gunawardana, Y.
ea91ad96-ade8-493e-8140-9dfb0882fede
Fujiwara, S.
6b72bf43-a69d-4a09-93ee-cba668fa8c57
Takeda, A.
f6243016-c00a-46eb-bb0d-dbbbc4dcdd6e
Woelk, C.H.
4d3af0fd-658f-4626-b3b5-49a6192bcf7d
Niranjan, Mahesan
5cbaeea8-7288-4b55-a89c-c43d212ddd4f
Gunawardana, Y.
ea91ad96-ade8-493e-8140-9dfb0882fede
Fujiwara, S.
6b72bf43-a69d-4a09-93ee-cba668fa8c57
Takeda, A.
f6243016-c00a-46eb-bb0d-dbbbc4dcdd6e
Woelk, C.H.
4d3af0fd-658f-4626-b3b5-49a6192bcf7d
Niranjan, Mahesan
5cbaeea8-7288-4b55-a89c-c43d212ddd4f

Gunawardana, Y., Fujiwara, S., Takeda, A., Woelk, C.H. and Niranjan, Mahesan (2015) Outlier detection at the transcriptome-proteome interface Bioinformatics (PMID:25819671).

Record type: Article

Abstract

BACKGROUND:

In high-throughput experimental biology, it is widely acknowledged that while expression levels measured at the levels of transcriptome and the corresponding proteome do not, in general, correlate well, messenger RNA levels are used as convenient proxies for protein levels. Our interest is in developing data-driven computational models that can bridge the gap between these two levels of measurement at which different mechanisms of regulation may act on different molecular species causing any observed lack of correlations. To this end, we build data-driven predictors of protein levels using mRNA levels and known proxies of translation efficiencies as covariates. Previous work showed that in such a setting, outliers with respect to the model are reliable candidates for post-translational regulation.
RESULTS:

Here, we present and compare two novel formulations of deriving a protein concentration predictor from which outliers may be extracted in a systematic manner. The first approach, outlier rejecting regression, allows explicit specification of a certain fraction of the data as outliers. In a regression setting, this is a non-convex optimization problem which we solve by deriving a difference of convex functions algorithm (DCA). With post-translationally regulated proteins, one expects their concentrations to be affected primarily by disruption of protein stability. Our second algorithm exploits this observation by minimizing an asymmetric loss using quantile regression and extracts outlier proteins whose measured concentrations are lower than what a genome-wide regression would predict. We validate the two approaches on a dataset of yeast transcriptome and proteome. Functional annotation check on detected outliers demonstrate that the methods are able to identify post-translationally regulated genes with high statistical confidence.

Full text not available from this repository.

More information

Accepted/In Press date: 24 March 2015
e-pub ahead of print date: 29 March 2015
Organisations: Clinical & Experimental Sciences

Identifiers

Local EPrints ID: 379219
URI: http://eprints.soton.ac.uk/id/eprint/379219
ISSN: 1367-4803
PURE UUID: dde69284-f949-456a-96e5-c968b2db6b94

Catalogue record

Date deposited: 18 Jul 2015 14:41
Last modified: 11 Nov 2017 04:00

Export record

Contributors

Author: Y. Gunawardana
Author: S. Fujiwara
Author: A. Takeda
Author: C.H. Woelk

University divisions

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×