The University of Southampton
University of Southampton Institutional Repository

SeroBA: rapid high-throughput serotyping of Streptococcus pneumoniae from whole genome sequence data

SeroBA: rapid high-throughput serotyping of Streptococcus pneumoniae from whole genome sequence data
SeroBA: rapid high-throughput serotyping of Streptococcus pneumoniae from whole genome sequence data
Streptococcus pneumoniae is responsible for 240 000–460 000 deaths in children under 5 years of age each year. Accurate identification of pneumococcal serotypes is important for tracking the distribution and evolution of serotypes following the introduction of effective vaccines. Recent efforts have been made to infer serotypes directly from genomic data but current software approaches are limited and do not scale well. Here, we introduce a novel method, SeroBA, which uses a k-mer approach. We compare SeroBA against real and simulated data and present results on the concordance and computational performance against a validation dataset, the robustness and scalability when analysing a large dataset, and the impact of varying the depth of coverage on sequence-based serotyping. SeroBA can predict serotypes, by identifying the cps locus, directly from raw whole genome sequencing read data with 98 % concordance using a k-mer-based method, can process 10 000 samples in just over 1 day using a standard server and can call serotypes at a coverage as low as 15–21×. SeroBA is implemented in Python3 and is freely available under an open source GPLv3 licence from: https://github.com/sanger-pathogens/seroba
Whole genome sequencing, Streptococcus pneumoniae, pneumococcal, Serotyping, k-mer method
2057-5858
1-6
Epping, Lennard
51483613-9c04-4d0a-b38f-71d4e5f21d05
Van Tonder, Andries J.
a600d507-76aa-48e8-b1d5-e75d90bb8e6e
Gladstone, Rebecca
6a2011bf-2561-4956-9928-46e6b927ba6d
Clarke, Stuart
f7d7f7a2-4b1f-4b36-883a-0f967e73fb17
Bentley, Stephen D
438443a4-8033-4a5d-a5a5-538dbd4e8d60
Page, Andrew J.
e8890f4d-1ac2-40c4-9fd6-aed7be14a586
Keane, Jacqueline A.
de5dffc5-42de-4511-a4ad-639f92ea3887
The Global Pneumococcal Sequencing Consortium
Epping, Lennard
51483613-9c04-4d0a-b38f-71d4e5f21d05
Van Tonder, Andries J.
a600d507-76aa-48e8-b1d5-e75d90bb8e6e
Gladstone, Rebecca
6a2011bf-2561-4956-9928-46e6b927ba6d
Clarke, Stuart
f7d7f7a2-4b1f-4b36-883a-0f967e73fb17
Bentley, Stephen D
438443a4-8033-4a5d-a5a5-538dbd4e8d60
Page, Andrew J.
e8890f4d-1ac2-40c4-9fd6-aed7be14a586
Keane, Jacqueline A.
de5dffc5-42de-4511-a4ad-639f92ea3887

Epping, Lennard, Van Tonder, Andries J., Gladstone, Rebecca, Bentley, Stephen D, Page, Andrew J. and Keane, Jacqueline A. , The Global Pneumococcal Sequencing Consortium (2018) SeroBA: rapid high-throughput serotyping of Streptococcus pneumoniae from whole genome sequence data. Microbial Genomics, 4 (7), 1-6. (doi:10.1099/mgen.0.000186).

Record type: Article

Abstract

Streptococcus pneumoniae is responsible for 240 000–460 000 deaths in children under 5 years of age each year. Accurate identification of pneumococcal serotypes is important for tracking the distribution and evolution of serotypes following the introduction of effective vaccines. Recent efforts have been made to infer serotypes directly from genomic data but current software approaches are limited and do not scale well. Here, we introduce a novel method, SeroBA, which uses a k-mer approach. We compare SeroBA against real and simulated data and present results on the concordance and computational performance against a validation dataset, the robustness and scalability when analysing a large dataset, and the impact of varying the depth of coverage on sequence-based serotyping. SeroBA can predict serotypes, by identifying the cps locus, directly from raw whole genome sequencing read data with 98 % concordance using a k-mer-based method, can process 10 000 samples in just over 1 day using a standard server and can call serotypes at a coverage as low as 15–21×. SeroBA is implemented in Python3 and is freely available under an open source GPLv3 licence from: https://github.com/sanger-pathogens/seroba

Text
SeroBA: rapid high-throughput serotyping of Streptococcus pneumoniae from whole genome sequence data - Version of Record
Available under License Creative Commons Attribution.
Download (865kB)

More information

Accepted/In Press date: 4 May 2018
e-pub ahead of print date: 15 June 2018
Published date: 1 July 2018
Keywords: Whole genome sequencing, Streptococcus pneumoniae, pneumococcal, Serotyping, k-mer method

Identifiers

Local EPrints ID: 432085
URI: http://eprints.soton.ac.uk/id/eprint/432085
ISSN: 2057-5858
PURE UUID: 558561d4-9e6f-45e8-8a3a-d0f868250c8d
ORCID for Stuart Clarke: ORCID iD orcid.org/0000-0002-7009-1548

Catalogue record

Date deposited: 01 Jul 2019 16:30
Last modified: 16 Mar 2024 03:52

Export record

Altmetrics

Contributors

Author: Lennard Epping
Author: Andries J. Van Tonder
Author: Rebecca Gladstone
Author: Stuart Clarke ORCID iD
Author: Stephen D Bentley
Author: Andrew J. Page
Author: Jacqueline A. Keane
Corporate Author: The Global Pneumococcal Sequencing Consortium

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×