AI3SD Video: Lessons learned from generative models of biological sequences
AI3SD Video: Lessons learned from generative models of biological sequences
De novo protein design for catalysis of any desired chemical reaction is a long-standing goal in protein engineering because of the broad spectrum of technological, scientific and medical applications. However, mapping protein sequence to protein function is currently neither computationally nor experimentally tangible. Here, I will present a recently develop ProteinGAN approach, a self-attention-based variant of the generative adversarial network that is able to ‘learn’ natural protein sequence diversity and enables the generation of functional protein sequences. ProteinGAN learns the evolutionary relationships of protein sequences directly from the complex multidimensional amino-acid sequence space and creates new, highly diverse sequence variants with natural-like physical properties. Using malate dehydrogenase (MDH) as a template enzyme, we show that 24% (13 out of 55 tested) of the ProteinGAN-generated and experimentally tested sequences are soluble and display MDH catalytic activity in the tested conditions in vitro, including a highly mutated variant of 106 amino-acid substitutions. ProteinGAN therefore demonstrates the potential of artificial intelligence to rapidly generate highly diverse functional proteins within the allowed biological constraints of the sequence space.
Talk is based on recently published work:
Repecka, D., Jauniskis, V., Karpus, L. et al. Expanding functional protein sequence spaces using generative adversarial networks. Nat Mach Intell 3, 324–333 (2021). https://doi.org/10.1038/s42256-021-00310-5
AI, AI3SD Event, Artificial Intelligence, Machine Intelligence, Machine Learning, ML, Proteins
Zeleniak, Aleksej
31e202ce-0f8d-4a6b-9995-52b019234c80
Frey, Jeremy G.
ba60c559-c4af-44f1-87e6-ce69819bf23f
Kanza, Samantha
b73bcf34-3ff8-4691-bd09-aa657dcff420
Niranjan, Mahesan
5cbaeea8-7288-4b55-a89c-c43d212ddd4f
17 June 2021
Zeleniak, Aleksej
31e202ce-0f8d-4a6b-9995-52b019234c80
Frey, Jeremy G.
ba60c559-c4af-44f1-87e6-ce69819bf23f
Kanza, Samantha
b73bcf34-3ff8-4691-bd09-aa657dcff420
Niranjan, Mahesan
5cbaeea8-7288-4b55-a89c-c43d212ddd4f
Zeleniak, Aleksej
(2021)
AI3SD Video: Lessons learned from generative models of biological sequences.
Frey, Jeremy G., Kanza, Samantha and Niranjan, Mahesan
(eds.)
AI 4 Proteins Seminar Series 2021.
14 Apr - 17 Jun 2021.
(doi:10.5258/SOTON/P0101).
Record type:
Conference or Workshop Item
(Other)
Abstract
De novo protein design for catalysis of any desired chemical reaction is a long-standing goal in protein engineering because of the broad spectrum of technological, scientific and medical applications. However, mapping protein sequence to protein function is currently neither computationally nor experimentally tangible. Here, I will present a recently develop ProteinGAN approach, a self-attention-based variant of the generative adversarial network that is able to ‘learn’ natural protein sequence diversity and enables the generation of functional protein sequences. ProteinGAN learns the evolutionary relationships of protein sequences directly from the complex multidimensional amino-acid sequence space and creates new, highly diverse sequence variants with natural-like physical properties. Using malate dehydrogenase (MDH) as a template enzyme, we show that 24% (13 out of 55 tested) of the ProteinGAN-generated and experimentally tested sequences are soluble and display MDH catalytic activity in the tested conditions in vitro, including a highly mutated variant of 106 amino-acid substitutions. ProteinGAN therefore demonstrates the potential of artificial intelligence to rapidly generate highly diverse functional proteins within the allowed biological constraints of the sequence space.
Talk is based on recently published work:
Repecka, D., Jauniskis, V., Karpus, L. et al. Expanding functional protein sequence spaces using generative adversarial networks. Nat Mach Intell 3, 324–333 (2021). https://doi.org/10.1038/s42256-021-00310-5
Video
AI4Proteins-Seminar-Series-AleksejZelezniak-170621
- Version of Record
More information
Published date: 17 June 2021
Additional Information:
Aleksej Zelezniak is a tenured Associate Professor, SciLifeLab fellow at the Chalmers University of Technology, Gothenburg, Sweden. He graduated MSc degree in Bioinformatics from the Technical University of Denmark with PhD at the European Molecular Biology Laboratory (EMBL) in Heidelberg, Germany developing network-based omics data integration methods for studying metabolic networks. For his postdoctoral training as an EMBO fellow, he joined the lab of Dr Markus Ralser at the University of Cambridge and the Francis Crick Institute, London, developing applications of machine learning to high-throughput mass spectrometry data. From 2017 he leads an independent research group developing machine learning approaches for de novo protein and DNA designs for biotechnology and synthetic biology applications.
Venue - Dates:
AI 4 Proteins Seminar Series 2021, 2021-04-14 - 2021-06-17
Keywords:
AI, AI3SD Event, Artificial Intelligence, Machine Intelligence, Machine Learning, ML, Proteins
Identifiers
Local EPrints ID: 450161
URI: http://eprints.soton.ac.uk/id/eprint/450161
PURE UUID: b25917bc-b9d5-44ee-a99e-f0a56a544a7e
Catalogue record
Date deposited: 14 Jul 2021 16:33
Last modified: 17 Mar 2024 03:51
Export record
Altmetrics
Contributors
Author:
Aleksej Zeleniak
Editor:
Mahesan Niranjan
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics