An evaluation of different classification algorithms for protein sequence-based reverse vaccinology prediction
An evaluation of different classification algorithms for protein sequence-based reverse vaccinology prediction
Previous work has shown that proteins that have the potential to be vaccine candidates can be predicted from features derived from their amino acid sequences. In this work, we make an empirical comparison across various machine learning classifiers on this sequence-based inference problem. Using systematic cross validation on a dataset of 200 known vaccine candidates and 200 negative examples, with a set of 525 features derived from the AA sequences and feature selection applied through a greedy backward elimination approach, we show that simple classification algorithms often perform as well as more complex support vector kernel machines. The work also includes a novel cross validation applied across bacterial species, i.e. the validation proteins all come from a specific species of bacterium not represented in the training set. We termed this type of validation Leave One Bacteria Out Validation (LOBOV).
Heinson, Ashley
822775d1-9379-4bde-99c3-3c031c3100fb
Ewing, Robert
022c5b04-da20-4e55-8088-44d0dc9935ae
Holloway, John
4bbd77e6-c095-445d-a36b-a50a72f6fe1a
Woelk, Christopher H.
4d3af0fd-658f-4626-b3b5-49a6192bcf7d
Niranjan, Mahesan
5cbaeea8-7288-4b55-a89c-c43d212ddd4f
13 December 2019
Heinson, Ashley
822775d1-9379-4bde-99c3-3c031c3100fb
Ewing, Robert
022c5b04-da20-4e55-8088-44d0dc9935ae
Holloway, John
4bbd77e6-c095-445d-a36b-a50a72f6fe1a
Woelk, Christopher H.
4d3af0fd-658f-4626-b3b5-49a6192bcf7d
Niranjan, Mahesan
5cbaeea8-7288-4b55-a89c-c43d212ddd4f
Heinson, Ashley, Ewing, Robert, Holloway, John, Woelk, Christopher H. and Niranjan, Mahesan
(2019)
An evaluation of different classification algorithms for protein sequence-based reverse vaccinology prediction.
PLoS ONE, 14 (12), [e0226256].
(doi:10.1371/journal.pone.0226256).
Abstract
Previous work has shown that proteins that have the potential to be vaccine candidates can be predicted from features derived from their amino acid sequences. In this work, we make an empirical comparison across various machine learning classifiers on this sequence-based inference problem. Using systematic cross validation on a dataset of 200 known vaccine candidates and 200 negative examples, with a set of 525 features derived from the AA sequences and feature selection applied through a greedy backward elimination approach, we show that simple classification algorithms often perform as well as more complex support vector kernel machines. The work also includes a novel cross validation applied across bacterial species, i.e. the validation proteins all come from a specific species of bacterium not represented in the training set. We termed this type of validation Leave One Bacteria Out Validation (LOBOV).
Text
journal.pone.0226256
- Version of Record
More information
Accepted/In Press date: 22 November 2019
e-pub ahead of print date: 13 December 2019
Published date: 13 December 2019
Identifiers
Local EPrints ID: 436685
URI: http://eprints.soton.ac.uk/id/eprint/436685
ISSN: 1932-6203
PURE UUID: 055e8d60-78de-47a5-a9e0-ed1b5ae61dfd
Catalogue record
Date deposited: 20 Dec 2019 18:30
Last modified: 17 Mar 2024 03:46
Export record
Altmetrics
Contributors
Author:
Ashley Heinson
Author:
Christopher H. Woelk
Author:
Mahesan Niranjan
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics