Modelling at the transcriptome - proteome interface

Gunawardana, Yawwani P. (2015) Modelling at the transcriptome - proteome interface. University of Southampton, Physical Sciences and Engineering, Doctoral Thesis, 211pp.

Record type: Thesis (Doctoral)

Abstract

In high-throughput experimental biology, it is widely acknowledged that mRNA expression levels and the corresponding protein abundances are jointly analysed to observe the relationship between these two omic measurements. While some experiments have shown a good correlation between transcriptome and proteome for some species under different conditions, such correlation values are not universal due to post-transcriptional and post-translational regulations. Thus, bridging the gap between transcriptome and proteome measurements allow us to uncover useful biological insights of the above regulations which are important to study on protein generation process and several disease conditions. We develop a data-driven predictor using transcriptome layer properties as proxies to protein abundance and employ the model in a novel manner to detect posttranslationally regulated proteins, hypothesizing that model failures (outlier proteins) occur due to protein stability disruption by post-translational modifications (PTMs). Three outlier detection techniques were employed with our protein abundance predictor to detect post-translationally regulated protein. Those are; (1) simple linear regression model which detects outliers by looking at the predicted and the measured protein scatter plot, (2) Outlier Rejecting Regression (ORR) model, a novel mathematical formulation which returns user-specific fraction of the data as outliers by solving a non-convex optimization problem using Difference of Convex functions Algorithm (DCA) and (3) Quantile Regression (QR) which employs an asymmetric loss model to detect outliers only with negative losses for the first time in omic world. Proteins extracted as outliers using above techniques confirmed our hypothesis on post-translational regulation (PTR) by providing high statistical confidence for functional annotations and pathway information. Therefore, this data-driven framework can be used as a reliable technique for biologists to reduce laboratory experimental workspace in detecting post-translationally regulated proteins.

We also perform a thorough inference analysis on most commonly used high-throughput microarray and RNA-Seq measurements using several machine learning inference techniques to observe whether their high numerical precision provides additional information about the gene with respect to the binary representation of gene switch on/off status. We perform this analysis at the transcriptome level and as well at the proteome level as an extended experimental setting of our PTR detection framework. These analyses suggest that binarized mRNA concentrations, which are measured using high-throughput RNA-Seq and microarray technologies are sufficient to perform accurate machine learning inferences similar to continuous measurements, not only at the transcriptome level but also at the proteome level to predict protein abundance and to detect protein with post-translation regulation to a high confidence level.

Text

Yawwani Gunawardana (25813722) -PhD Thesis.pdf - Other

Download (6MB)