Improving transcription factor binding site predictions by using randomised negative examples
Improving transcription factor binding site predictions by using randomised negative examples
It is known that much of the genetic change underlying morphological evolution takes place in cis-regulatory regions, rather than in the coding regions of genes. Identifying these sites in a genome is a non-trivial problem. Experimental methods for finding binding sites exist with some limitations regarding their applicability, accuracy, availability or cost. On the other hand predicting algorithms perform rather poorly. The aim of this research is to develop and improve computational approaches for the prediction of transcription factor binding sites (TFBSs) by integrating the results of computational algorithms and other sources of complementary biological evidence, with particular emphasis on the use of the Support Vector Machine (SVM). Data from two organisms, yeast and mouse, were used in this study. The initial results were not particularly encouraging, as still giving predictions of low quality. However, when the vectors labelled as non-binding sites in the training set were replaced by randomised training vectors, a significant improvement in performance was observed. This gave substantial improvement over the yeast genome and even greater improvement for the mouse data. In fact the resulting classifier was finding over 80% of the binding sites in the test set and moreover 80% of the predictions were correct.
225-237
Rezwan, Faisal
203f8f38-1f5d-485b-ab11-c546b4276338
Sun, Yi
52b4df91-6eec-4c04-8106-7cd195f1d0a6
Davey, Neil
45038a2a-60fa-475b-be2b-72b23c97bb0c
Adams, Rod
aba52023-234f-464a-b86f-504b200dc950
Rust, Alistair G.
27e6975d-abef-4037-a8ff-74b2a18cb687
Robinson, Mark
0191ef40-12cc-4b4d-9bcd-5547087add95
2012
Rezwan, Faisal
203f8f38-1f5d-485b-ab11-c546b4276338
Sun, Yi
52b4df91-6eec-4c04-8106-7cd195f1d0a6
Davey, Neil
45038a2a-60fa-475b-be2b-72b23c97bb0c
Adams, Rod
aba52023-234f-464a-b86f-504b200dc950
Rust, Alistair G.
27e6975d-abef-4037-a8ff-74b2a18cb687
Robinson, Mark
0191ef40-12cc-4b4d-9bcd-5547087add95
Rezwan, Faisal, Sun, Yi, Davey, Neil, Adams, Rod, Rust, Alistair G. and Robinson, Mark
(2012)
Improving transcription factor binding site predictions by using randomised negative examples.
In Information Processing in Cells and Tissues - 9th International Conference, IPCAT 2012, Proceedings.
vol. 7223 LNCS,
.
(doi:10.1007/978-3-642-28792-3_28).
Record type:
Conference or Workshop Item
(Paper)
Abstract
It is known that much of the genetic change underlying morphological evolution takes place in cis-regulatory regions, rather than in the coding regions of genes. Identifying these sites in a genome is a non-trivial problem. Experimental methods for finding binding sites exist with some limitations regarding their applicability, accuracy, availability or cost. On the other hand predicting algorithms perform rather poorly. The aim of this research is to develop and improve computational approaches for the prediction of transcription factor binding sites (TFBSs) by integrating the results of computational algorithms and other sources of complementary biological evidence, with particular emphasis on the use of the Support Vector Machine (SVM). Data from two organisms, yeast and mouse, were used in this study. The initial results were not particularly encouraging, as still giving predictions of low quality. However, when the vectors labelled as non-binding sites in the training set were replaced by randomised training vectors, a significant improvement in performance was observed. This gave substantial improvement over the yeast genome and even greater improvement for the mouse data. In fact the resulting classifier was finding over 80% of the binding sites in the test set and moreover 80% of the predictions were correct.
This record has no associated files available for download.
More information
Published date: 2012
Venue - Dates:
9th International Conference on Information Processing in Cells and Tissues, IPCAT 2012, , Cambridge, United Kingdom, 2012-03-31 - 2012-04-02
Identifiers
Local EPrints ID: 414017
URI: http://eprints.soton.ac.uk/id/eprint/414017
ISSN: 03029743
PURE UUID: 0a0551fd-6e57-4673-8648-4f7b339bab73
Catalogue record
Date deposited: 12 Sep 2017 16:31
Last modified: 06 Jun 2024 01:51
Export record
Altmetrics
Contributors
Author:
Faisal Rezwan
Author:
Yi Sun
Author:
Neil Davey
Author:
Rod Adams
Author:
Alistair G. Rust
Author:
Mark Robinson
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics