Enhancing the biological relevance of machine learning classifiers for reverse vaccinology
Enhancing the biological relevance of machine learning classifiers for reverse vaccinology
Reverse vaccinology (RV) is a bioinformatics approach that can predict antigens with protective potential from the protein coding genomes of bacterial pathogens for subunit vaccine design. RV has become firmly established following the development of the BEXSERO® vaccine against Neisseria meningitidis serogroup B. RV studies have begun to incorporate machine learning (ML) techniques to distinguish bacterial protective antigens (BPAs) from non-BPAs. This research contributes significantly to the RV field by using permutation analysis to demonstrate that a signal for protective antigens can be curated from published data. Furthermore, the effects of the following on an ML approach to RV were also assessed: nested cross-validation, balancing selection of non-BPAs for subcellular localization, increasing the training data, and incorporating greater numbers of protein annotation tools for feature generation. These enhancements yielded a support vector machine (SVM) classifier that could discriminate BPAs (n = 200) from non-BPAs (n = 200) with an area under the curve (AUC) of 0.787. In addition, hierarchical clustering of BPAs revealed that intracellular BPAs clustered separately from extracellular BPAs. However, no immediate benefit was derived when training SVM classifiers on data sets exclusively containing intra- or extracellular BPAs. In conclusion, this work demonstrates that ML classifiers have great utility in RV approaches and will lead to new subunit vaccines in the future.
Heinson, Ashley
822775d1-9379-4bde-99c3-3c031c3100fb
Gunawardana, Yawwani P
e7a9c0f0-8452-43f8-8623-24be36ef5cb3
Moesker, Bastiaan
4d8a2308-e949-4c0d-8ad5-6099b5c8aa09
Denman Hume, Carmen C.
bc2f7921-b191-4d3e-87b3-0c5116bbc545
Vataga, Elena
a7bbb165-96a2-4235-916e-a38eafa7a0a2
Hall, Yper
84a1a1ae-829f-4522-b9e6-b55f96d5d660
Stylianou, Elena
9d0e8222-1353-4f94-bd1d-76de8ab825ad
Mcshane, Helen
08d12cb0-42b4-40f7-ad20-6294f4ddd747
Williams, Ann
9cc09f36-22cb-422d-a79e-8b3eab1bdb49
Niranjan, Mahesan
5cbaeea8-7288-4b55-a89c-c43d212ddd4f
Woelk, Christopher H.
4d3af0fd-658f-4626-b3b5-49a6192bcf7d
Heinson, Ashley
822775d1-9379-4bde-99c3-3c031c3100fb
Gunawardana, Yawwani P
e7a9c0f0-8452-43f8-8623-24be36ef5cb3
Moesker, Bastiaan
4d8a2308-e949-4c0d-8ad5-6099b5c8aa09
Denman Hume, Carmen C.
bc2f7921-b191-4d3e-87b3-0c5116bbc545
Vataga, Elena
a7bbb165-96a2-4235-916e-a38eafa7a0a2
Hall, Yper
84a1a1ae-829f-4522-b9e6-b55f96d5d660
Stylianou, Elena
9d0e8222-1353-4f94-bd1d-76de8ab825ad
Mcshane, Helen
08d12cb0-42b4-40f7-ad20-6294f4ddd747
Williams, Ann
9cc09f36-22cb-422d-a79e-8b3eab1bdb49
Niranjan, Mahesan
5cbaeea8-7288-4b55-a89c-c43d212ddd4f
Woelk, Christopher H.
4d3af0fd-658f-4626-b3b5-49a6192bcf7d
Heinson, Ashley, Gunawardana, Yawwani P, Moesker, Bastiaan, Denman Hume, Carmen C., Vataga, Elena, Hall, Yper, Stylianou, Elena, Mcshane, Helen, Williams, Ann, Niranjan, Mahesan and Woelk, Christopher H.
(2017)
Enhancing the biological relevance of machine learning classifiers for reverse vaccinology.
International Journal of Molecular Sciences, 18 (2).
(doi:10.3390/ijms18020312).
Abstract
Reverse vaccinology (RV) is a bioinformatics approach that can predict antigens with protective potential from the protein coding genomes of bacterial pathogens for subunit vaccine design. RV has become firmly established following the development of the BEXSERO® vaccine against Neisseria meningitidis serogroup B. RV studies have begun to incorporate machine learning (ML) techniques to distinguish bacterial protective antigens (BPAs) from non-BPAs. This research contributes significantly to the RV field by using permutation analysis to demonstrate that a signal for protective antigens can be curated from published data. Furthermore, the effects of the following on an ML approach to RV were also assessed: nested cross-validation, balancing selection of non-BPAs for subcellular localization, increasing the training data, and incorporating greater numbers of protein annotation tools for feature generation. These enhancements yielded a support vector machine (SVM) classifier that could discriminate BPAs (n = 200) from non-BPAs (n = 200) with an area under the curve (AUC) of 0.787. In addition, hierarchical clustering of BPAs revealed that intracellular BPAs clustered separately from extracellular BPAs. However, no immediate benefit was derived when training SVM classifiers on data sets exclusively containing intra- or extracellular BPAs. In conclusion, this work demonstrates that ML classifiers have great utility in RV approaches and will lead to new subunit vaccines in the future.
Text
ijms-18-00312-v3
- Version of Record
More information
Accepted/In Press date: 17 January 2017
e-pub ahead of print date: 1 February 2017
Identifiers
Local EPrints ID: 446082
URI: http://eprints.soton.ac.uk/id/eprint/446082
ISSN: 1422-0067
PURE UUID: 3bbfa81c-87d1-40c0-b337-bc24a23f762a
Catalogue record
Date deposited: 20 Jan 2021 17:31
Last modified: 17 Mar 2024 03:46
Export record
Altmetrics
Contributors
Author:
Ashley Heinson
Author:
Yawwani P Gunawardana
Author:
Bastiaan Moesker
Author:
Carmen C. Denman Hume
Author:
Elena Vataga
Author:
Yper Hall
Author:
Elena Stylianou
Author:
Helen Mcshane
Author:
Ann Williams
Author:
Mahesan Niranjan
Author:
Christopher H. Woelk
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics