Improving reverse vaccinology with a machine learning approach
Improving reverse vaccinology with a machine learning approach
Reverse vaccinology aims to accelerate subunit vaccine design by rapidly predicting which proteins in a pathogenic bacterial proteome are putative protective antigens.Support vector machine classification is a machine learning approach that has been applied to solve numerous classification problems in biological sciences but has not previously been incorporated into a reverse vaccinology approach. A training data set of 136 bacterial protective antigens paired with 136 non-antigens was constructed and bioinformatic tools were used to annotate this data for predicted protein features, many of which are associated with antigenicity (i.e. extracellular localization, signal peptides and B-cell epitopes). Annotation was used to train support vector machine classifiers that exhibited a maximum accuracy of 92% for discriminating protective antigens from non-antigens as assessed by a leave-tenth-out cross validation approach. These accuracies were superior to those achieved when annotating training data with auto and cross covariance transformations of z-descriptors for hydrophobicity, molecular size and polarity, or when classification was performed using regression methods. To further validate support vector machine classifiers,they were used to rank all the proteins in six bacterial proteomes for their antigenicity. Protective antigens from the training data were significantly recalled (enriched) in the top 75 ranked proteins for all six proteomes as assessed by a Fisher’s exact test (p < 0.05). This paper describes a superior workflow for performing reverse vaccinology studies and provides a benchmark training data set that can be used to evaluate future methodological improvements.
reverse vaccinology, bacterial pathogens, protective antigen, support vector machines
8156-8164
Bowman, Brett N.
93ec7ea0-56ba-460a-9a15-c9ff910791f1
McAdam, Paul R.
4a5ea1fc-27fe-4903-956e-77b05b8b7c81
Vivona, Sandro
302aeabd-a992-4a19-b01a-54a5574525d4
Zhang, Jin X.
a84b2681-e9e5-469e-b563-df5c34722dc4
Luong, Tiffany
3bd8dc89-4a8a-4616-a230-4210b18be1c6
Belew, Richard K.
71b66f69-a9d8-4094-8cf7-63fe3db9dfb2
Sahota, Harpal
8e2b0dd2-b7b8-4e62-88b1-33f1acaaf677
Guiney, Donald
cd9eccf0-ba70-4e4c-864b-03a58f7a59df
Valafar, Faramarz
01569bd4-279a-41bf-b121-a533de6f5551
Fierer, Joshua
d1d0403a-4eda-4b23-a762-fec5c2bb2e9b
Woelk, C.H.
4d3af0fd-658f-4626-b3b5-49a6192bcf7d
October 2011
Bowman, Brett N.
93ec7ea0-56ba-460a-9a15-c9ff910791f1
McAdam, Paul R.
4a5ea1fc-27fe-4903-956e-77b05b8b7c81
Vivona, Sandro
302aeabd-a992-4a19-b01a-54a5574525d4
Zhang, Jin X.
a84b2681-e9e5-469e-b563-df5c34722dc4
Luong, Tiffany
3bd8dc89-4a8a-4616-a230-4210b18be1c6
Belew, Richard K.
71b66f69-a9d8-4094-8cf7-63fe3db9dfb2
Sahota, Harpal
8e2b0dd2-b7b8-4e62-88b1-33f1acaaf677
Guiney, Donald
cd9eccf0-ba70-4e4c-864b-03a58f7a59df
Valafar, Faramarz
01569bd4-279a-41bf-b121-a533de6f5551
Fierer, Joshua
d1d0403a-4eda-4b23-a762-fec5c2bb2e9b
Woelk, C.H.
4d3af0fd-658f-4626-b3b5-49a6192bcf7d
Bowman, Brett N., McAdam, Paul R., Vivona, Sandro, Zhang, Jin X., Luong, Tiffany, Belew, Richard K., Sahota, Harpal, Guiney, Donald, Valafar, Faramarz, Fierer, Joshua and Woelk, C.H.
(2011)
Improving reverse vaccinology with a machine learning approach.
Vaccine, 29 (45), .
(doi:10.1016/j.vaccine.2011.07.142).
(PMID:21864619)
Abstract
Reverse vaccinology aims to accelerate subunit vaccine design by rapidly predicting which proteins in a pathogenic bacterial proteome are putative protective antigens.Support vector machine classification is a machine learning approach that has been applied to solve numerous classification problems in biological sciences but has not previously been incorporated into a reverse vaccinology approach. A training data set of 136 bacterial protective antigens paired with 136 non-antigens was constructed and bioinformatic tools were used to annotate this data for predicted protein features, many of which are associated with antigenicity (i.e. extracellular localization, signal peptides and B-cell epitopes). Annotation was used to train support vector machine classifiers that exhibited a maximum accuracy of 92% for discriminating protective antigens from non-antigens as assessed by a leave-tenth-out cross validation approach. These accuracies were superior to those achieved when annotating training data with auto and cross covariance transformations of z-descriptors for hydrophobicity, molecular size and polarity, or when classification was performed using regression methods. To further validate support vector machine classifiers,they were used to rank all the proteins in six bacterial proteomes for their antigenicity. Protective antigens from the training data were significantly recalled (enriched) in the top 75 ranked proteins for all six proteomes as assessed by a Fisher’s exact test (p < 0.05). This paper describes a superior workflow for performing reverse vaccinology studies and provides a benchmark training data set that can be used to evaluate future methodological improvements.
This record has no associated files available for download.
More information
e-pub ahead of print date: 22 August 2011
Published date: October 2011
Keywords:
reverse vaccinology, bacterial pathogens, protective antigen, support vector machines
Organisations:
Clinical & Experimental Sciences
Identifiers
Local EPrints ID: 350568
URI: http://eprints.soton.ac.uk/id/eprint/350568
PURE UUID: a26f0b62-9d07-462b-b780-e8fcfdc158ec
Catalogue record
Date deposited: 05 Apr 2013 14:52
Last modified: 14 Mar 2024 13:27
Export record
Altmetrics
Contributors
Author:
Brett N. Bowman
Author:
Paul R. McAdam
Author:
Sandro Vivona
Author:
Jin X. Zhang
Author:
Tiffany Luong
Author:
Richard K. Belew
Author:
Harpal Sahota
Author:
Donald Guiney
Author:
Faramarz Valafar
Author:
Joshua Fierer
Author:
C.H. Woelk
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics