Protein NMR assignment by isotope pattern recognition
Protein NMR assignment by isotope pattern recognition
The current standard method for amino acid signal identification in protein NMR spectra is sequential assignment using triple-resonance experiments. Good software and elaborate heuristics exist, but the process remains laboriously manual. Machine learning does help, but its training databases need millions of samples that cover all relevant physics and every kind of instrumental artifact. In this communication, we offer a solution to this problem. We propose polyadic decompositions to store millions of simulated three-dimensional NMR spectra, on-the-fly generation of artifacts during training, a probabilistic way to incorporate prior and posterior information, and integration with the industry standard CcpNmr software framework. The resulting neural nets take [
1H,
13C] slices of mixed pyruvate–labeled HNCA spectra (different CA signal shapes for different residue types) and return an amino acid probability table. In combination with primary sequence information, backbones of common proteins (GB1, MBP, and INMT) are rapidly assigned from just the HNCA spectrum.
Rasulov, Uluk
c31a7c8c-3838-4357-833a-1aae8e119171
Wang, Harrison K.
0e9ecc5a-1d82-470f-8adf-129e13d8b10a
Viennet, Thibault
3288686f-598c-4242-9581-91460c2fa507
Droemer, Maxim A.
1d910483-3e3e-49b1-820c-e5544d4fa025
Matosin, Srđan
3ba8ef01-2033-407b-ba15-5607d06249e5
Schindler, Sebastian
486f8dd4-e145-4f1e-bff7-91c1b6c12f73
Sun, Zhen-Yu J.
957e6489-8188-4ecc-8882-5c92b3497ec0
Mureddu, Luca
963f5a4f-dbe5-4a82-a0f3-d60dd9f479b9
Vuister, Geerten W.
f693dcbb-57e9-4839-ae59-e5116fd83626
Robson, Scott A.
06a943ca-378d-443e-a078-0a4fa7f3a8a0
Arthanari, Haribabu
e6908018-4f11-4276-ac59-f77630dd3939
Kuprov, Ilya
bb07f28a-5038-4524-8146-e3fc8344c065
Rasulov, Uluk
c31a7c8c-3838-4357-833a-1aae8e119171
Wang, Harrison K.
0e9ecc5a-1d82-470f-8adf-129e13d8b10a
Viennet, Thibault
3288686f-598c-4242-9581-91460c2fa507
Droemer, Maxim A.
1d910483-3e3e-49b1-820c-e5544d4fa025
Matosin, Srđan
3ba8ef01-2033-407b-ba15-5607d06249e5
Schindler, Sebastian
486f8dd4-e145-4f1e-bff7-91c1b6c12f73
Sun, Zhen-Yu J.
957e6489-8188-4ecc-8882-5c92b3497ec0
Mureddu, Luca
963f5a4f-dbe5-4a82-a0f3-d60dd9f479b9
Vuister, Geerten W.
f693dcbb-57e9-4839-ae59-e5116fd83626
Robson, Scott A.
06a943ca-378d-443e-a078-0a4fa7f3a8a0
Arthanari, Haribabu
e6908018-4f11-4276-ac59-f77630dd3939
Kuprov, Ilya
bb07f28a-5038-4524-8146-e3fc8344c065
Rasulov, Uluk, Wang, Harrison K., Viennet, Thibault, Droemer, Maxim A., Matosin, Srđan, Schindler, Sebastian, Sun, Zhen-Yu J., Mureddu, Luca, Vuister, Geerten W., Robson, Scott A., Arthanari, Haribabu and Kuprov, Ilya
(2024)
Protein NMR assignment by isotope pattern recognition.
Science Advances, 10 (36), [eado0403].
(doi:10.1126/sciadv.ado0403).
Abstract
The current standard method for amino acid signal identification in protein NMR spectra is sequential assignment using triple-resonance experiments. Good software and elaborate heuristics exist, but the process remains laboriously manual. Machine learning does help, but its training databases need millions of samples that cover all relevant physics and every kind of instrumental artifact. In this communication, we offer a solution to this problem. We propose polyadic decompositions to store millions of simulated three-dimensional NMR spectra, on-the-fly generation of artifacts during training, a probabilistic way to incorporate prior and posterior information, and integration with the industry standard CcpNmr software framework. The resulting neural nets take [
1H,
13C] slices of mixed pyruvate–labeled HNCA spectra (different CA signal shapes for different residue types) and return an amino acid probability table. In combination with primary sequence information, backbones of common proteins (GB1, MBP, and INMT) are rapidly assigned from just the HNCA spectrum.
Text
manuscript_sci_adv_format
- Accepted Manuscript
Restricted to Repository staff only
Request a copy
Text
sciadv.ado0403
- Version of Record
More information
Accepted/In Press date: 29 July 2024
e-pub ahead of print date: 4 September 2024
Additional Information:
The authors acknowledge the use of the IRIDIS High Performance Computing Facility, and associat-ed support services at the University of Southampton, in the completion of this work.
Identifiers
Local EPrints ID: 494988
URI: http://eprints.soton.ac.uk/id/eprint/494988
ISSN: 2375-2548
PURE UUID: 38d59061-502b-4d56-a178-9d56dbfe0de8
Catalogue record
Date deposited: 24 Oct 2024 16:50
Last modified: 25 Oct 2024 01:44
Export record
Altmetrics
Contributors
Author:
Uluk Rasulov
Author:
Harrison K. Wang
Author:
Thibault Viennet
Author:
Maxim A. Droemer
Author:
Srđan Matosin
Author:
Sebastian Schindler
Author:
Zhen-Yu J. Sun
Author:
Luca Mureddu
Author:
Geerten W. Vuister
Author:
Scott A. Robson
Author:
Haribabu Arthanari
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics