The University of Southampton
University of Southampton Institutional Repository

Predicting glycan structure from tandem mass spectrometry via deep learning

Predicting glycan structure from tandem mass spectrometry via deep learning
Predicting glycan structure from tandem mass spectrometry via deep learning

Glycans constitute the most complicated post-translational modification, modulating protein activity in health and disease. However, structural annotation from tandem mass spectrometry (MS/MS) data is a bottleneck in glycomics, preventing high-throughput endeavors and relegating glycomics to a few experts. Trained on a newly curated set of 500,000 annotated MS/MS spectra, here we present CandyCrunch, a dilated residual neural network predicting glycan structure from raw liquid chromatography–MS/MS data in seconds (top-1 accuracy: 90.3%). We developed an open-access Python-based workflow of raw data conversion and prediction, followed by automated curation and fragment annotation, with predictions recapitulating and extending expert annotation. We demonstrate that this can be used for de novo annotation, diagnostic fragment identification and high-throughput glycomics. For maximum impact, this entire pipeline is tightly interlaced with our glycowork platform and can be easily tested at https://colab.research.google.com/github/BojarLab/CandyCrunch/blob/main/CandyCrunch.ipynb. We envision CandyCrunch to democratize structural glycomics and the elucidation of biological roles of glycans.

1548-7091
1206-1215
Urban, James
72e83b2c-12d5-42d7-a32a-2ff5d83b6116
Jin, Chunsheng
3294edf0-69cb-408e-a4f6-709a5a45b7b4
Thomsson, Kristina A.
b63999db-05d1-4f71-b3c2-bc620e638619
Karlsson, Niclas G.
1036f4d0-3080-4337-ad17-43b312657406
Ives, Callum M.
b8c798a7-ddf0-40ac-8194-c757032b85e2
Fadda, Elisa
11ba1755-9585-44aa-a38e-a8bcfd766abb
Bojar, Daniel
9c301895-2b74-4d82-8ac1-69d4b90b3958
Urban, James
72e83b2c-12d5-42d7-a32a-2ff5d83b6116
Jin, Chunsheng
3294edf0-69cb-408e-a4f6-709a5a45b7b4
Thomsson, Kristina A.
b63999db-05d1-4f71-b3c2-bc620e638619
Karlsson, Niclas G.
1036f4d0-3080-4337-ad17-43b312657406
Ives, Callum M.
b8c798a7-ddf0-40ac-8194-c757032b85e2
Fadda, Elisa
11ba1755-9585-44aa-a38e-a8bcfd766abb
Bojar, Daniel
9c301895-2b74-4d82-8ac1-69d4b90b3958

Urban, James, Jin, Chunsheng, Thomsson, Kristina A., Karlsson, Niclas G., Ives, Callum M., Fadda, Elisa and Bojar, Daniel (2024) Predicting glycan structure from tandem mass spectrometry via deep learning. Nature Methods, 21 (7), 1206-1215. (doi:10.1038/s41592-024-02314-6).

Record type: Article

Abstract

Glycans constitute the most complicated post-translational modification, modulating protein activity in health and disease. However, structural annotation from tandem mass spectrometry (MS/MS) data is a bottleneck in glycomics, preventing high-throughput endeavors and relegating glycomics to a few experts. Trained on a newly curated set of 500,000 annotated MS/MS spectra, here we present CandyCrunch, a dilated residual neural network predicting glycan structure from raw liquid chromatography–MS/MS data in seconds (top-1 accuracy: 90.3%). We developed an open-access Python-based workflow of raw data conversion and prediction, followed by automated curation and fragment annotation, with predictions recapitulating and extending expert annotation. We demonstrate that this can be used for de novo annotation, diagnostic fragment identification and high-throughput glycomics. For maximum impact, this entire pipeline is tightly interlaced with our glycowork platform and can be easily tested at https://colab.research.google.com/github/BojarLab/CandyCrunch/blob/main/CandyCrunch.ipynb. We envision CandyCrunch to democratize structural glycomics and the elucidation of biological roles of glycans.

Text
s41592-024-02314-6 - Version of Record
Available under License Creative Commons Attribution.
Download (3MB)

More information

Accepted/In Press date: 17 May 2024
Published date: 1 July 2024
Additional Information: Publisher Copyright: © The Author(s) 2024.

Identifiers

Local EPrints ID: 500250
URI: http://eprints.soton.ac.uk/id/eprint/500250
ISSN: 1548-7091
PURE UUID: 75561460-5642-4bd5-bde4-30cd887c7467
ORCID for Elisa Fadda: ORCID iD orcid.org/0000-0002-2898-7770

Catalogue record

Date deposited: 23 Apr 2025 16:43
Last modified: 22 Aug 2025 02:42

Export record

Altmetrics

Contributors

Author: James Urban
Author: Chunsheng Jin
Author: Kristina A. Thomsson
Author: Niclas G. Karlsson
Author: Callum M. Ives
Author: Elisa Fadda ORCID iD
Author: Daniel Bojar

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×