Edwards, Richard J., Davey, Norman E. and Shields, Denis C.
SLiMFinder: a probabilistic method for identifying over-represented, convergently evolved, short linear motifs in proteins.
PLoS ONE, 2, (10), . (doi:10.1371/journal.pone.0000967).
Background. Short linear motifs (SLiMs) in proteins are functional microdomains of fundamental importance in many
biological systems. SLiMs typically consist of a 3 to 10 amino acid stretch of the primary protein sequence, of which as few as
two sites may be important for activity, making identification of novel SLiMs extremely difficult. In particular, it can be very
difficult to distinguish a randomly recurring ‘‘motif’’ from a truly over-represented one. Incorporating ambiguous amino acid
positions and/or variable-length wildcard spacers between defined residues further complicates the matter.
Principal Findings. In this paper we present two algorithms. SLiMBuild identifies convergently evolved, short motifs in
a dataset of proteins. Motifs are built by combining dimers into longer patterns, retaining only those motifs occurring in
a sufficient number of unrelated proteins. Motifs with fixed amino acid positions are identified and then combined to
incorporate amino acid ambiguity and variable-length wildcard spacers. The algorithm is computationally efficient compared
to alternatives, particularly when datasets include homologous proteins, and provides great flexibility in the nature of motifs
returned. The SLiMChance algorithm estimates the probability of returned motifs arising by chance, correcting for the size and
composition of the dataset, and assigns a significance value to each motif. These algorithms are implemented in a software
package, SLiMFinder. SLiMFinder default settings identify known SLiMs with 100% specificity, and have a low false discovery
rate on random test data.
Conclusions/Significance. The efficiency of SLiMBuild and low false discovery rate of SLiMChance
make SLiMFinder highly suited to high throughput motif discovery and individual high quality analyses alike. Examples of such
analyses on real biological data, and how SLiMFinder results can help direct future discoveries, are provided. SLiMFinder is
freely available for download under a GNU license from http://bioinformatics.ucd.ie/shields/software/slimfinder/.
Actions (login required)