Mining Protein Database using Machine Learning Techniques

Camargo, Renata and Niranjan, Mahesan (2008) Mining Protein Database using Machine Learning Techniques Journal of Integrative Bioinformatics, 5, (2), pp. 1-10.

This is the latest version of this item.


[img] PDF CamargoNiranjan_JIB_2008.pdf - Other
Download (499kB)


With a large amount of information relating to proteins accumulating in databases widely available online, it is of interest to apply machine learning techniques that, by extracting underlying statistical regularities in the data, make predictions about the functional and evolutionary characteristics of unseen proteins. Such predictions can help in achieving a reduction in the space over which experiment designers need to search in order to improve our understanding of the biochemical properties. Previously it has been suggested that an integration of features computable by comparing a pair of proteins can be achieved by an artificial neural network, hence predicting the degree to which they may be evolutionary related and homologous. We compiled two datasets of pairs of proteins, each pair being characterised by seven distinct features. We performed an exhaustive search through all possible combinations of features, for the problem of separating remote homologous from analogous pairs, we note that significant performance gain was obtained by the inclusion of sequence and structure information. We find that the use of a linear classifier was enough to discriminate a protein pair at the family level. However, at the superfamily level, to detect remote homologous pairs was a relatively harder problem. We find that the use of nonlinear classifiers achieve significantly higher accuracies. In this paper, we compare three different pattern classification methods on two problems formulated as detecting evolutionary and functional relationships between pairs of proteins, and from extensive cross validation and feature selection based studies quantify the average limits and uncertainties with which such predictions may be made. Feature selection points to a "knowledge gap" in currently available functional annotations. We demonstrate how the scheme may be employed in a framework to associate an individual protein with an existing family of evolutionarily related proteins.

Item Type: Article
Related URLs:
Organisations: Southampton Wireless Group
ePrint ID: 266704
Date :
Date Event
1 August 2008Published
Date Deposited: 24 Sep 2008 07:29
Last Modified: 17 Apr 2017 19:00
Further Information:Google Scholar

Available Versions of this Item

Actions (login required)

View Item View Item