The University of Southampton
University of Southampton Institutional Repository

Using varying negative examples to improve computational predictions of transcription factor binding sites

Using varying negative examples to improve computational predictions of transcription factor binding sites
Using varying negative examples to improve computational predictions of transcription factor binding sites

The identification of transcription factor binding sites (TFBSs) is a non-trivial problem as the existing computational predictors produce a lot of false predictions. Though it is proven that combining these predictions with a meta-classifier, like Support Vector Machines (SVMs), can improve the overall results, this improvement is not as significant as expected. The reason for this is that the predictors are not reliable for the negative examples from non-binding sites in the promoter region. Therefore, using negative examples from different sources during training an SVM can be one of the solutions to this problem. In this study, we used different types of negative examples during training the classifier. These negative examples can be far away from the promoter regions or produced by randomisation or from the intronic region of genes. By using these negative examples during training, we observed their effect in improving predictions of TFBSs in the yeast. We also used a modified cross-validation method for this type of problem. Thus we observed substantial improvement in the classifier performance that could constitute a model for predicting TFBSs. Therefore, the major contribution of the analysis is that for the yeast genome, the position of binding sites could be predicted with high confidence using our technique and the predictions are of much higher quality than the predictions of the original prediction algorithms.

18650929
234-243
Springer Berlin, Heidelberg
Rezwan, Faisal
203f8f38-1f5d-485b-ab11-c546b4276338
Sun, Yi
52b4df91-6eec-4c04-8106-7cd195f1d0a6
Davey, Neil
45038a2a-60fa-475b-be2b-72b23c97bb0c
Adams, Rod
aba52023-234f-464a-b86f-504b200dc950
Rust, Alistair G.
27e6975d-abef-4037-a8ff-74b2a18cb687
Robinson, Mark
0191ef40-12cc-4b4d-9bcd-5547087add95
Rezwan, Faisal
203f8f38-1f5d-485b-ab11-c546b4276338
Sun, Yi
52b4df91-6eec-4c04-8106-7cd195f1d0a6
Davey, Neil
45038a2a-60fa-475b-be2b-72b23c97bb0c
Adams, Rod
aba52023-234f-464a-b86f-504b200dc950
Rust, Alistair G.
27e6975d-abef-4037-a8ff-74b2a18cb687
Robinson, Mark
0191ef40-12cc-4b4d-9bcd-5547087add95

Rezwan, Faisal, Sun, Yi, Davey, Neil, Adams, Rod, Rust, Alistair G. and Robinson, Mark (2012) Using varying negative examples to improve computational predictions of transcription factor binding sites. In Engineering Applications of Neural Networks - 13th International Conference, EANN 2012, Proceedings. vol. 311, Springer Berlin, Heidelberg. pp. 234-243 . (doi:10.1007/978-3-642-32909-8_24).

Record type: Conference or Workshop Item (Paper)

Abstract

The identification of transcription factor binding sites (TFBSs) is a non-trivial problem as the existing computational predictors produce a lot of false predictions. Though it is proven that combining these predictions with a meta-classifier, like Support Vector Machines (SVMs), can improve the overall results, this improvement is not as significant as expected. The reason for this is that the predictors are not reliable for the negative examples from non-binding sites in the promoter region. Therefore, using negative examples from different sources during training an SVM can be one of the solutions to this problem. In this study, we used different types of negative examples during training the classifier. These negative examples can be far away from the promoter regions or produced by randomisation or from the intronic region of genes. By using these negative examples during training, we observed their effect in improving predictions of TFBSs in the yeast. We also used a modified cross-validation method for this type of problem. Thus we observed substantial improvement in the classifier performance that could constitute a model for predicting TFBSs. Therefore, the major contribution of the analysis is that for the yeast genome, the position of binding sites could be predicted with high confidence using our technique and the predictions are of much higher quality than the predictions of the original prediction algorithms.

This record has no associated files available for download.

More information

Published date: 2012
Venue - Dates: 2012 International Conference on Artificial Intelligence and Computational Intelligence, AICI 2012, , Chengdu, China, 2012-10-26 - 2012-10-28

Identifiers

Local EPrints ID: 413747
URI: http://eprints.soton.ac.uk/id/eprint/413747
ISSN: 18650929
PURE UUID: 0c1114be-307d-441f-aac4-29fd91628eb8
ORCID for Faisal Rezwan: ORCID iD orcid.org/0000-0001-9921-222X

Catalogue record

Date deposited: 04 Sep 2017 16:30
Last modified: 16 Mar 2024 04:13

Export record

Altmetrics

Contributors

Author: Faisal Rezwan ORCID iD
Author: Yi Sun
Author: Neil Davey
Author: Rod Adams
Author: Alistair G. Rust
Author: Mark Robinson

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×