Mining Protein Database using Machine Learning Techniques


Niranjan, Mahesan (2008) Mining Protein Database using Machine Learning Techniques. Journal of Integrative Bioinformatics, 5, (2), 1-10.

WarningThere is a more recent version of this item available.

Download

[img] PDF - Published Version
Download (487Kb)

Description/Abstract

With a large amount of information relating to proteins accumulating in databases widely available online, it is of interest to apply machine learning techniques that, by extracting underlying statistical regularities in the data, make predictions about the functional and evolutionary characteristics of unseen proteins. Such predictions can help in achieving a reduction in the space over which experiment designers need to search in order to improve our understanding of the biochemical properties. Previously it has been suggested that an integration of features computable by comparing a pair of proteins can be achieved by an artificial neural network, hence predicting the degree to which they may be evolutionary related and homologous. We compiled two datasets of pairs of proteins, each pair being characterised by seven distinct features. We performed an exhaustive search through all possible combinations of features, for the problem of separating remote homologous from analogous pairs, we note that significant performance gain was obtained by the inclusion of sequence and structure information. We find that the use of a linear classifier was enough to discriminate a protein pair at the family level. However, at the superfamily level, to detect remote homologous pairs was a relatively harder problem. We find that the use of nonlinear classifiers achieve significantly higher accuracies. In this paper, we compare three different pattern classification methods on two problems formulated as detecting evolutionary and functional relationships between pairs of proteins, and from extensive cross validation and feature selection based studies quantify the average limits and uncertainties with which such predictions may be made. Feature selection points to a "knowledge gap" in currently available functional annotations. We demonstrate how the scheme may be employed in a framework to associate an individual protein with an existing family of evolutionarily related proteins.

Item Type: Article
ISSNs: 1613-4516
Related URLs:
Divisions: Faculty of Physical Sciences and Engineering > Electronics and Computer Science
ePrint ID: 266687
Date Deposited: 20 Sep 2008 06:55
Last Modified: 27 Mar 2014 20:12
Further Information:Google Scholar
URI: http://eprints.soton.ac.uk/id/eprint/266687

Available Versions of this Item

Actions (login required)

View Item View Item

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics