The University of Southampton
University of Southampton Institutional Repository

An evaluation of different classification algorithms for protein sequence-based reverse vaccinology prediction

An evaluation of different classification algorithms for protein sequence-based reverse vaccinology prediction
An evaluation of different classification algorithms for protein sequence-based reverse vaccinology prediction
Previous work has shown that proteins that have the potential to be vaccine candidates can be predicted from features derived from their amino acid sequences. In this work, we make an empirical comparison across various machine learning classifiers on this sequence-based inference problem. Using systematic cross validation on a dataset of 200 known vaccine candidates and 200 negative examples, with a set of 525 features derived from the AA sequences and feature selection applied through a greedy backward elimination approach, we show that simple classification algorithms often perform as well as more complex support vector kernel machines. The work also includes a novel cross validation applied across bacterial species, i.e. the validation proteins all come from a specific species of bacterium not represented in the training set. We termed this type of validation Leave One Bacteria Out Validation (LOBOV).
1932-6203
Heinson, Ashley
822775d1-9379-4bde-99c3-3c031c3100fb
Ewing, Robert
022c5b04-da20-4e55-8088-44d0dc9935ae
Holloway, John
4bbd77e6-c095-445d-a36b-a50a72f6fe1a
Woelk, Christopher H.
4d3af0fd-658f-4626-b3b5-49a6192bcf7d
Niranjan, Mahesan
5cbaeea8-7288-4b55-a89c-c43d212ddd4f
Heinson, Ashley
822775d1-9379-4bde-99c3-3c031c3100fb
Ewing, Robert
022c5b04-da20-4e55-8088-44d0dc9935ae
Holloway, John
4bbd77e6-c095-445d-a36b-a50a72f6fe1a
Woelk, Christopher H.
4d3af0fd-658f-4626-b3b5-49a6192bcf7d
Niranjan, Mahesan
5cbaeea8-7288-4b55-a89c-c43d212ddd4f

Heinson, Ashley, Ewing, Robert, Holloway, John, Woelk, Christopher H. and Niranjan, Mahesan (2019) An evaluation of different classification algorithms for protein sequence-based reverse vaccinology prediction. PLoS ONE, 14 (12), [E0226256]. (doi:10.1371/journal.pone.0226256).

Record type: Article

Abstract

Previous work has shown that proteins that have the potential to be vaccine candidates can be predicted from features derived from their amino acid sequences. In this work, we make an empirical comparison across various machine learning classifiers on this sequence-based inference problem. Using systematic cross validation on a dataset of 200 known vaccine candidates and 200 negative examples, with a set of 525 features derived from the AA sequences and feature selection applied through a greedy backward elimination approach, we show that simple classification algorithms often perform as well as more complex support vector kernel machines. The work also includes a novel cross validation applied across bacterial species, i.e. the validation proteins all come from a specific species of bacterium not represented in the training set. We termed this type of validation Leave One Bacteria Out Validation (LOBOV).

Text
journal.pone.0226256 - Version of Record
Available under License Creative Commons Attribution.
Download (1MB)

More information

Accepted/In Press date: 22 November 2019
e-pub ahead of print date: 13 December 2019
Published date: 13 December 2019

Identifiers

Local EPrints ID: 436685
URI: http://eprints.soton.ac.uk/id/eprint/436685
ISSN: 1932-6203
PURE UUID: 055e8d60-78de-47a5-a9e0-ed1b5ae61dfd
ORCID for Robert Ewing: ORCID iD orcid.org/0000-0001-6510-4001
ORCID for John Holloway: ORCID iD orcid.org/0000-0001-9998-0464

Catalogue record

Date deposited: 20 Dec 2019 18:30
Last modified: 27 Jan 2020 13:45

Export record

Altmetrics

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×