The University of Southampton
University of Southampton Institutional Repository

Improving transcription factor binding site predictions by using randomised negative examples

Improving transcription factor binding site predictions by using randomised negative examples
Improving transcription factor binding site predictions by using randomised negative examples

It is known that much of the genetic change underlying morphological evolution takes place in cis-regulatory regions, rather than in the coding regions of genes. Identifying these sites in a genome is a non-trivial problem. Experimental methods for finding binding sites exist with some limitations regarding their applicability, accuracy, availability or cost. On the other hand predicting algorithms perform rather poorly. The aim of this research is to develop and improve computational approaches for the prediction of transcription factor binding sites (TFBSs) by integrating the results of computational algorithms and other sources of complementary biological evidence, with particular emphasis on the use of the Support Vector Machine (SVM). Data from two organisms, yeast and mouse, were used in this study. The initial results were not particularly encouraging, as still giving predictions of low quality. However, when the vectors labelled as non-binding sites in the training set were replaced by randomised training vectors, a significant improvement in performance was observed. This gave substantial improvement over the yeast genome and even greater improvement for the mouse data. In fact the resulting classifier was finding over 80% of the binding sites in the test set and moreover 80% of the predictions were correct.

03029743
225-237
Rezwan, Faisal
203f8f38-1f5d-485b-ab11-c546b4276338
Sun, Yi
52b4df91-6eec-4c04-8106-7cd195f1d0a6
Davey, Neil
45038a2a-60fa-475b-be2b-72b23c97bb0c
Adams, Rod
aba52023-234f-464a-b86f-504b200dc950
Rust, Alistair G.
27e6975d-abef-4037-a8ff-74b2a18cb687
Robinson, Mark
0191ef40-12cc-4b4d-9bcd-5547087add95
Rezwan, Faisal
203f8f38-1f5d-485b-ab11-c546b4276338
Sun, Yi
52b4df91-6eec-4c04-8106-7cd195f1d0a6
Davey, Neil
45038a2a-60fa-475b-be2b-72b23c97bb0c
Adams, Rod
aba52023-234f-464a-b86f-504b200dc950
Rust, Alistair G.
27e6975d-abef-4037-a8ff-74b2a18cb687
Robinson, Mark
0191ef40-12cc-4b4d-9bcd-5547087add95

Rezwan, Faisal, Sun, Yi, Davey, Neil, Adams, Rod, Rust, Alistair G. and Robinson, Mark (2012) Improving transcription factor binding site predictions by using randomised negative examples. In Information Processing in Cells and Tissues - 9th International Conference, IPCAT 2012, Proceedings. vol. 7223 LNCS, pp. 225-237 . (doi:10.1007/978-3-642-28792-3_28).

Record type: Conference or Workshop Item (Paper)

Abstract

It is known that much of the genetic change underlying morphological evolution takes place in cis-regulatory regions, rather than in the coding regions of genes. Identifying these sites in a genome is a non-trivial problem. Experimental methods for finding binding sites exist with some limitations regarding their applicability, accuracy, availability or cost. On the other hand predicting algorithms perform rather poorly. The aim of this research is to develop and improve computational approaches for the prediction of transcription factor binding sites (TFBSs) by integrating the results of computational algorithms and other sources of complementary biological evidence, with particular emphasis on the use of the Support Vector Machine (SVM). Data from two organisms, yeast and mouse, were used in this study. The initial results were not particularly encouraging, as still giving predictions of low quality. However, when the vectors labelled as non-binding sites in the training set were replaced by randomised training vectors, a significant improvement in performance was observed. This gave substantial improvement over the yeast genome and even greater improvement for the mouse data. In fact the resulting classifier was finding over 80% of the binding sites in the test set and moreover 80% of the predictions were correct.

This record has no associated files available for download.

More information

Published date: 2012
Venue - Dates: 9th International Conference on Information Processing in Cells and Tissues, IPCAT 2012, , Cambridge, United Kingdom, 2012-03-31 - 2012-04-02

Identifiers

Local EPrints ID: 414017
URI: http://eprints.soton.ac.uk/id/eprint/414017
ISSN: 03029743
PURE UUID: 0a0551fd-6e57-4673-8648-4f7b339bab73
ORCID for Faisal Rezwan: ORCID iD orcid.org/0000-0001-9921-222X

Catalogue record

Date deposited: 12 Sep 2017 16:31
Last modified: 16 Mar 2024 04:13

Export record

Altmetrics

Contributors

Author: Faisal Rezwan ORCID iD
Author: Yi Sun
Author: Neil Davey
Author: Rod Adams
Author: Alistair G. Rust
Author: Mark Robinson

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×