The University of Southampton
University of Southampton Institutional Repository

Exploiting gene expression and protein data for predicting remote homology and tissue specificity

Record type: Thesis (Doctoral)

In this thesis I describe my investigations of applying machine learning methods to high throughput experimental and predicted biological data. The importance of such analysis as a means of making inferences about biological functions is widely acknowledged in the bioinformatics community. Specifically, this work makes three novel contributions based on the systematic analysis of publicly archived data of protein sequences, three dimensional structures, gene expression and functional annotations: (a) remote homology detection based on amino acid sequences and secondary structures; (b) the analysis of tissue-specific gene expression for predictive signals in the sequence and secondary structure of the resulting protein product; and (c) a study of ageing in the fruit fly, a commonly used model organism, in which tissue specific and whole-organism gene expression changes are contrasted.

In the problem of remote homology detection, a kernel-based method that combines pairwise alignment scores of amino acid sequences and secondary structures is shown to improve the prediction accuracies in a benchmark task defined using the Structural Classification of Proteins (SCOP) database. While the task of predicting SCOP superfamilies should be regarded as an easy one, with not much room for performance improvement, it is still widely accepted as the gold standard due to careful manual annotation by experts in the subject of protein evolution.

A similar method is introduced to investigate whether tissue specificity of gene expression is correlated with the sequence and secondary structure of the resulting protein product. An information theoretic approach is adopted for sorting fruit fly and mouse genes according to their tissue specificity based on gene expression data. A classifier is then trained to predict the degree of specificity for these genes. The study concludes that the tissue specificity of gene expression is correlated with the sequence, and to a certain extent, with the secondary structure of the gene’s protein product.

The sorted list of genes introduced in the previous chapter is used to investigate the tissue specificity of transcript profiles obtained from a study of ageing in the fruit fly. The same list is utilised to investigate how filtering tissue-restricted genes affects gene set enrichment analysis in the ageing study, and to examine the specificity of age-associated genes identified in the literature. The conclusion drawn in this chapter is that categorisation of genes according to their tissue specificity using Shannon’s information theory is useful for the interpretation of whole-fly gene expression data.

PDF finalThesis_wieser.pdf - Other
Download (9MB)

Citation

Wieser, Daniela (2010) Exploiting gene expression and protein data for predicting remote homology and tissue specificity University of Southampton, School of Electronics and Computer Science, Doctoral Thesis , 252pp.

More information

Published date: June 2010
Organisations: University of Southampton

Identifiers

Local EPrints ID: 159177
URI: http://eprints.soton.ac.uk/id/eprint/159177
PURE UUID: d426a71f-e010-4894-ae22-d92491d26dfb

Catalogue record

Date deposited: 15 Jul 2010 16:00
Last modified: 18 Jul 2017 12:37

Export record

Contributors

Author: Daniela Wieser
Thesis advisor: Mahesan Niranjan

University divisions


Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×