The University of Southampton
University of Southampton Institutional Repository

Integrative modelling of protein abundance via sequence information

Integrative modelling of protein abundance via sequence information
Integrative modelling of protein abundance via sequence information
Understanding the complex interactions between the transcriptome and proteome is essential in uncovering cellular mechanisms both in health and disease contexts. The underwhelming correlation between corresponding transcript and protein abundance suggests that regulatory processes tightly govern information flow surrounding transcription, translation and post translation; particularly in higher order organisms. Inherent difficulties associated with global proteome measurement make modelling protein abundance via proxies desirable, given the pivotal role that intra-cellular proteins play in cell regulation and function. In this thesis, a protein abundance predictor is developed across the human cell cycle using mRNA and translation abundance, determining that mRNA level alone insufficiently explains the transcriptome-proteome relationship. To expand the feature space, some 30 sequence-derived features (SDFs) were engineered that impact proteins before translation, and we demonstrated in our published works that overestimated outliers to fitted models (r 2 = 0.67) are associated with posttranslational regulation and degradation. It made sense then to expand on the concept of using sequence-engineered features as generalized predictors to expression; a large dataset was curated covering the entire human transcriptome to derive over 180 new features, spanning from genome to estimated post-translational modifications. SDFs were designed with scale and generality in mind; allowing for their application in a variety of ’omic studies. This newly generated resource was validated by systematically analysing intra-feature correlations and unsupervised learning techniques to mitigate inevitable multicollinearity. Finally, global protein abundance prediction using SDFs was attempted, finding that sequence information alone leads to model scores of r 2 = 0.45, with mRNA abundance included adding 5% to explaining model variance. Unpacking fitted SDF models using gene ontology analysis revealed a close relationship between SDFs and translation; helping to explain their improved model performance over mRNA level. This data driven approach helps to isolate proteins of interest by outlier detection, with SDF use biased towards predicting steady-state protein abundance.
University of Southampton
Parkes, Gregory, Michael
af295bd0-62d8-4ff5-a325-251c59c7537d
Parkes, Gregory, Michael
af295bd0-62d8-4ff5-a325-251c59c7537d
Niranjan, Mahesan
5cbaeea8-7288-4b55-a89c-c43d212ddd4f

Parkes, Gregory, Michael (2022) Integrative modelling of protein abundance via sequence information. University of Southampton, Doctoral Thesis, 223pp.

Record type: Thesis (Doctoral)

Abstract

Understanding the complex interactions between the transcriptome and proteome is essential in uncovering cellular mechanisms both in health and disease contexts. The underwhelming correlation between corresponding transcript and protein abundance suggests that regulatory processes tightly govern information flow surrounding transcription, translation and post translation; particularly in higher order organisms. Inherent difficulties associated with global proteome measurement make modelling protein abundance via proxies desirable, given the pivotal role that intra-cellular proteins play in cell regulation and function. In this thesis, a protein abundance predictor is developed across the human cell cycle using mRNA and translation abundance, determining that mRNA level alone insufficiently explains the transcriptome-proteome relationship. To expand the feature space, some 30 sequence-derived features (SDFs) were engineered that impact proteins before translation, and we demonstrated in our published works that overestimated outliers to fitted models (r 2 = 0.67) are associated with posttranslational regulation and degradation. It made sense then to expand on the concept of using sequence-engineered features as generalized predictors to expression; a large dataset was curated covering the entire human transcriptome to derive over 180 new features, spanning from genome to estimated post-translational modifications. SDFs were designed with scale and generality in mind; allowing for their application in a variety of ’omic studies. This newly generated resource was validated by systematically analysing intra-feature correlations and unsupervised learning techniques to mitigate inevitable multicollinearity. Finally, global protein abundance prediction using SDFs was attempted, finding that sequence information alone leads to model scores of r 2 = 0.45, with mRNA abundance included adding 5% to explaining model variance. Unpacking fitted SDF models using gene ontology analysis revealed a close relationship between SDFs and translation; helping to explain their improved model performance over mRNA level. This data driven approach helps to isolate proteins of interest by outlier detection, with SDF use biased towards predicting steady-state protein abundance.

Text
THESIS - Version of Record
Available under License University of Southampton Thesis Licence.
Download (10MB)
Text
PTD_Thesis_Parkes-SIGNED
Restricted to Repository staff only

More information

Submitted date: August 2021
Published date: February 2022

Identifiers

Local EPrints ID: 457300
URI: http://eprints.soton.ac.uk/id/eprint/457300
PURE UUID: f6f60585-93c2-4712-a985-10bc962bbd79
ORCID for Mahesan Niranjan: ORCID iD orcid.org/0000-0001-7021-140X

Catalogue record

Date deposited: 31 May 2022 16:37
Last modified: 17 Mar 2024 03:11

Export record

Contributors

Author: Gregory, Michael Parkes
Thesis advisor: Mahesan Niranjan ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×