Inference from binary gene expression data

Tuna, Salih (2009) Inference from binary gene expression data. University of Southampton, School of Electronics and Computer Science, Doctoral Thesis, 163pp.

Record type: Thesis (Doctoral)

Abstract

Microarrays provide a practical method for measuring the mRNA abundances of thousands of genes in a single experiment. Analysing such large dimensional data is a challenge which attracts researchers from many different fields and machine learning is one of them. However, the biological properties of mRNA such as its low stability, measurements being taken from a population of cells rather than from a single cell, etc. should make researchers sceptical about the high numerical precision reported and thus the reproducibility of these measurements. In this study we explore data representation at lower numerical precision, down to binary (retaining only the information whether a gene is expressed or not), thereby improving the quality of inferences drawn from microarray studies. With binary representation, we propose a solution to reduce the effect of algorithmic choice in the pre-processing stages.
First we compare the information loss if researchers made the inferences from quantized transcriptome data rather than the continuous values. Classification, clustering, periodicity detection and analysis of developmental time series data are considered here. Our results showed that there is not much information loss with binary data. Then, by focusing on the two most widely used inference tools, classification and clustering, we show that inferences drawn from transcriptome data can actually be improved with a metric suitable for binary data. This is explained with the uncertainties of the probe level data. We also show that binary transcriptome data can be used in cross-platform studies and when used with Tanimoto kernel, this increase the performance of inferences when compared to individual datasets.
In the last part of this work we show that binary transcriptome data reduces the effect of algorithm choice for pre-processing raw data. While there are many different algorithms for pre-processing stages there are few guidelines for the users as to which one to choose. In many studies it has been shown that the choice of algorithms has significant impact on the overall results of microarray studies. Here we show in classification, that if transcriptome data is binarized after pre-processed with any combination of algorithms it has the effect of reducing the variability of the results and increasing the performance of the classifier simultaneously.

Text

Thesis.pdf - Other

Download (3MB)

More information

Published date: October 2009

Organisations: University of Southampton

Identifiers

Local EPrints ID: 69359

URI: http://eprints.soton.ac.uk/id/eprint/69359

PURE UUID: 2cb01126-cc4f-41b4-8ae4-8765122986f5

ORCID for Mahesan Niranjan:

orcid.org/0000-0001-7021-140X

Catalogue record

Date deposited: 05 Nov 2009

Last modified: 14 Mar 2024 02:53

Export record

Share this record

Share this on Facebook Share this on Twitter Share this on Weibo

Contributors

Author: Salih Tuna

Thesis advisor: Mahesan Niranjan

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Library staff additional information