The University of Southampton
University of Southampton Institutional Repository

Working across the Omic scales in high throughput data-driven biology

Working across the Omic scales in high throughput data-driven biology
Working across the Omic scales in high throughput data-driven biology
The expression of a gene, encoded as a sequence of nucleotides in the genome of an organism, and its regulations can be observed at various levels: messenger RNA levels, translation levels in the ribosome, the cellular concentration of the resulting protein, mutations and aberrations at the sequence level, epigenetic markers and its regulations with microRNAs. Bulk of the research in bioinformatics, however, is to make high throughput measurements at any one of these ‘levels’ (or ‘views’) and look for their functional implications such as finding biomarkers of complex diseases. Such analyzes ignore relationships that exist across these levels of observation, and can also lead to misleading results: for example genes that show similar expression between two cohorts of interest may be differently expressed at the protein level due to regulation being the cause of the difference between the cohorts. In this work, I consider approaches for integrative analysis across these levels of gene expression and study examples from classification and regression (survival prediction). For these studies, I use publicly archived data from The Cancer Genome Atlas (TCGA). For classification and regression, I carry out extensive feature selection using Fisher Ratio and Greedy Forward Feature Selection algorithms, respectively. I quantify the performance of classifiers designed using one view of data, evaluated on another. I then carry out feature selection in an integrative way and found that integrative analysis of multi-omic data might not significantly well perform than the best single level. Another study uses an in-house data measured in a single laboratory. I use this data to show the importance of measuring translation rates (monosome and polysome) in the studies related to transcriptome-proteome correlation. I binarise mRNA data in this analysis to reduce the errors in numerical precision of the biological experiments, cluster the genes based on their binary values and used GO analysis to validate them biologically. Within my clusters, I show that including translation rates in protein prediction models may explain the protein level more accurately than total mRNA alone (conventionally measured). At the same time, I show the impact of some sequence-derived features on these models and how these impacts vary between clusters. Hence, using the whole features on bulk of genes is not an efficient way of doing this prediction.
University of Southampton
Jeyananthan, Pratheeba
f4c533ad-d3f4-43c5-ae6e-5f201729b71d
Jeyananthan, Pratheeba
f4c533ad-d3f4-43c5-ae6e-5f201729b71d
Niranjan, Mahesan
5cbaeea8-7288-4b55-a89c-c43d212ddd4f

Jeyananthan, Pratheeba (2020) Working across the Omic scales in high throughput data-driven biology. University of Southampton, Doctoral Thesis, 201pp.

Record type: Thesis (Doctoral)

Abstract

The expression of a gene, encoded as a sequence of nucleotides in the genome of an organism, and its regulations can be observed at various levels: messenger RNA levels, translation levels in the ribosome, the cellular concentration of the resulting protein, mutations and aberrations at the sequence level, epigenetic markers and its regulations with microRNAs. Bulk of the research in bioinformatics, however, is to make high throughput measurements at any one of these ‘levels’ (or ‘views’) and look for their functional implications such as finding biomarkers of complex diseases. Such analyzes ignore relationships that exist across these levels of observation, and can also lead to misleading results: for example genes that show similar expression between two cohorts of interest may be differently expressed at the protein level due to regulation being the cause of the difference between the cohorts. In this work, I consider approaches for integrative analysis across these levels of gene expression and study examples from classification and regression (survival prediction). For these studies, I use publicly archived data from The Cancer Genome Atlas (TCGA). For classification and regression, I carry out extensive feature selection using Fisher Ratio and Greedy Forward Feature Selection algorithms, respectively. I quantify the performance of classifiers designed using one view of data, evaluated on another. I then carry out feature selection in an integrative way and found that integrative analysis of multi-omic data might not significantly well perform than the best single level. Another study uses an in-house data measured in a single laboratory. I use this data to show the importance of measuring translation rates (monosome and polysome) in the studies related to transcriptome-proteome correlation. I binarise mRNA data in this analysis to reduce the errors in numerical precision of the biological experiments, cluster the genes based on their binary values and used GO analysis to validate them biologically. Within my clusters, I show that including translation rates in protein prediction models may explain the protein level more accurately than total mRNA alone (conventionally measured). At the same time, I show the impact of some sequence-derived features on these models and how these impacts vary between clusters. Hence, using the whole features on bulk of genes is not an efficient way of doing this prediction.

Text
Final thesis unsigned
Available under License University of Southampton Thesis Licence.
Download (10MB)
Text
PTD Signed
Restricted to Repository staff only

More information

Published date: April 2020

Identifiers

Local EPrints ID: 447740
URI: http://eprints.soton.ac.uk/id/eprint/447740
PURE UUID: 58e13bd3-afd3-41e6-b643-447b19f499b0
ORCID for Mahesan Niranjan: ORCID iD orcid.org/0000-0001-7021-140X

Catalogue record

Date deposited: 19 Mar 2021 17:31
Last modified: 17 Mar 2024 06:26

Export record

Contributors

Author: Pratheeba Jeyananthan
Thesis advisor: Mahesan Niranjan ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×