The University of Southampton
University of Southampton Institutional Repository

SLiMFinder: a probabilistic method for identifying over-represented, convergently evolved, short linear motifs in proteins

SLiMFinder: a probabilistic method for identifying over-represented, convergently evolved, short linear motifs in proteins
SLiMFinder: a probabilistic method for identifying over-represented, convergently evolved, short linear motifs in proteins
Background. Short linear motifs (SLiMs) in proteins are functional microdomains of fundamental importance in many biological systems. SLiMs typically consist of a 3 to 10 amino acid stretch of the primary protein sequence, of which as few as two sites may be important for activity, making identification of novel SLiMs extremely difficult. In particular, it can be very difficult to distinguish a randomly recurring ‘‘motif’’ from a truly over-represented one. Incorporating ambiguous amino acid positions and/or variable-length wildcard spacers between defined residues further complicates the matter.
Methodology/ Principal Findings. In this paper we present two algorithms. SLiMBuild identifies convergently evolved, short motifs in a dataset of proteins. Motifs are built by combining dimers into longer patterns, retaining only those motifs occurring in a sufficient number of unrelated proteins. Motifs with fixed amino acid positions are identified and then combined to incorporate amino acid ambiguity and variable-length wildcard spacers. The algorithm is computationally efficient compared to alternatives, particularly when datasets include homologous proteins, and provides great flexibility in the nature of motifs returned. The SLiMChance algorithm estimates the probability of returned motifs arising by chance, correcting for the size and composition of the dataset, and assigns a significance value to each motif. These algorithms are implemented in a software package, SLiMFinder. SLiMFinder default settings identify known SLiMs with 100% specificity, and have a low false discovery rate on random test data.
Conclusions/Significance. The efficiency of SLiMBuild and low false discovery rate of SLiMChance make SLiMFinder highly suited to high throughput motif discovery and individual high quality analyses alike. Examples of such analyses on real biological data, and how SLiMFinder results can help direct future discoveries, are provided. SLiMFinder is freely available for download under a GNU license from http://bioinformatics.ucd.ie/shields/software/slimfinder/.
1932-6203
e967-[11pp]
Edwards, Richard J.
9d25e74f-dc0d-455a-832c-5f363d864c43
Davey, Norman E.
bdaded6d-ac23-4a43-b347-3113159dfb70
Shields, Denis C.
57ffee4f-0277-4b3d-9c7a-8c328637d8e6
Edwards, Richard J.
9d25e74f-dc0d-455a-832c-5f363d864c43
Davey, Norman E.
bdaded6d-ac23-4a43-b347-3113159dfb70
Shields, Denis C.
57ffee4f-0277-4b3d-9c7a-8c328637d8e6

Edwards, Richard J., Davey, Norman E. and Shields, Denis C. (2007) SLiMFinder: a probabilistic method for identifying over-represented, convergently evolved, short linear motifs in proteins. PLoS ONE, 2 (10), e967-[11pp]. (doi:10.1371/journal.pone.0000967).

Record type: Article

Abstract

Background. Short linear motifs (SLiMs) in proteins are functional microdomains of fundamental importance in many biological systems. SLiMs typically consist of a 3 to 10 amino acid stretch of the primary protein sequence, of which as few as two sites may be important for activity, making identification of novel SLiMs extremely difficult. In particular, it can be very difficult to distinguish a randomly recurring ‘‘motif’’ from a truly over-represented one. Incorporating ambiguous amino acid positions and/or variable-length wildcard spacers between defined residues further complicates the matter.
Methodology/ Principal Findings. In this paper we present two algorithms. SLiMBuild identifies convergently evolved, short motifs in a dataset of proteins. Motifs are built by combining dimers into longer patterns, retaining only those motifs occurring in a sufficient number of unrelated proteins. Motifs with fixed amino acid positions are identified and then combined to incorporate amino acid ambiguity and variable-length wildcard spacers. The algorithm is computationally efficient compared to alternatives, particularly when datasets include homologous proteins, and provides great flexibility in the nature of motifs returned. The SLiMChance algorithm estimates the probability of returned motifs arising by chance, correcting for the size and composition of the dataset, and assigns a significance value to each motif. These algorithms are implemented in a software package, SLiMFinder. SLiMFinder default settings identify known SLiMs with 100% specificity, and have a low false discovery rate on random test data.
Conclusions/Significance. The efficiency of SLiMBuild and low false discovery rate of SLiMChance make SLiMFinder highly suited to high throughput motif discovery and individual high quality analyses alike. Examples of such analyses on real biological data, and how SLiMFinder results can help direct future discoveries, are provided. SLiMFinder is freely available for download under a GNU license from http://bioinformatics.ucd.ie/shields/software/slimfinder/.

Text
Edwards_RJ,_Davey_NE_&_Shields_DC_(2007)_-_PLoS_One_2(10)-e967_-_SLiMFinder.pdf - Version of Record
Restricted to Repository staff only

More information

Published date: 3 October 2007

Identifiers

Local EPrints ID: 48740
URI: http://eprints.soton.ac.uk/id/eprint/48740
ISSN: 1932-6203
PURE UUID: 42b02195-f665-4c75-9551-6f20d594d1d2

Catalogue record

Date deposited: 11 Oct 2007
Last modified: 15 Mar 2024 09:49

Export record

Altmetrics

Contributors

Author: Richard J. Edwards
Author: Norman E. Davey
Author: Denis C. Shields

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×