The University of Southampton
University of Southampton Institutional Repository

Classification with binary gene expressions

Classification with binary gene expressions
Classification with binary gene expressions
Microarray gene expression measurements are reported, used and archived usually to high numerical precision. However, properties of mRNA molecules, such as their low stability and availability in small copy numbers, and the fact that measurements correspond to a population of cells, rather than a single cell, makes high precision meaningless. Recent work shows that reducing measurement precision leads to very little loss of information, right down to binary levels. In this paper we show how properties of binary spaces can be useful in making inferences from microarray data. In particular, we use the Tanimoto similarity metric for binary vectors, which has been used effectively in the Chemoinformatics literature for retrieving chemical compounds with certain functional properties. This measure, when incorporated in a kernel framework, helps recover any information lost by quantization. By implementing a spectral clustering framework, we further show that a second reason for high performance from the Tanimoto metric can be traced back to a hitherto unnoticed systematic variability in array data: Probe level uncertainties are systematically lower for arrays with large numbers of expressed genes. While we offer no molecular level explanation for this systematic variability, that it could be exploited in a suitable similarity metric is a useful observation in itself. We further show preliminary results that working with binary data considerably reduces variability in the results across choice of algorithms in the pre-processing stages of microarray analysis.
1937-6871
390-399
Tuna, Salih
10b3ffcd-3ed8-4bd5-987a-4d26946d685d
Niranjan, Mahesan
5cbaeea8-7288-4b55-a89c-c43d212ddd4f
Tuna, Salih
10b3ffcd-3ed8-4bd5-987a-4d26946d685d
Niranjan, Mahesan
5cbaeea8-7288-4b55-a89c-c43d212ddd4f

Tuna, Salih and Niranjan, Mahesan (2009) Classification with binary gene expressions. Journal of Biomedical Science and Engineering, 2 (6), 390-399.

Record type: Article

Abstract

Microarray gene expression measurements are reported, used and archived usually to high numerical precision. However, properties of mRNA molecules, such as their low stability and availability in small copy numbers, and the fact that measurements correspond to a population of cells, rather than a single cell, makes high precision meaningless. Recent work shows that reducing measurement precision leads to very little loss of information, right down to binary levels. In this paper we show how properties of binary spaces can be useful in making inferences from microarray data. In particular, we use the Tanimoto similarity metric for binary vectors, which has been used effectively in the Chemoinformatics literature for retrieving chemical compounds with certain functional properties. This measure, when incorporated in a kernel framework, helps recover any information lost by quantization. By implementing a spectral clustering framework, we further show that a second reason for high performance from the Tanimoto metric can be traced back to a hitherto unnoticed systematic variability in array data: Probe level uncertainties are systematically lower for arrays with large numbers of expressed genes. While we offer no molecular level explanation for this systematic variability, that it could be exploited in a suitable similarity metric is a useful observation in itself. We further show preliminary results that working with binary data considerably reduces variability in the results across choice of algorithms in the pre-processing stages of microarray analysis.

Text
tuna_jbise.pdf - Other
Download (164kB)

More information

Published date: 2009
Organisations: Southampton Wireless Group

Identifiers

Local EPrints ID: 268186
URI: http://eprints.soton.ac.uk/id/eprint/268186
ISSN: 1937-6871
PURE UUID: ffe2b628-a5b8-43ae-88f6-5107a8c8940e
ORCID for Mahesan Niranjan: ORCID iD orcid.org/0000-0001-7021-140X

Catalogue record

Date deposited: 11 Nov 2009 14:19
Last modified: 15 Mar 2024 03:29

Export record

Contributors

Author: Salih Tuna
Author: Mahesan Niranjan ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×