The University of Southampton
University of Southampton Institutional Repository

A universal foundation model for transfer learning in molecular crystals

A universal foundation model for transfer learning in molecular crystals
A universal foundation model for transfer learning in molecular crystals
The physical and chemical properties of molecular crystals are a combined function of molecular structure and the molecular crystal packing. Specific crystal packings can enable applications such as pharmaceuticals, organic electronics, and porous materials for gas storage. However, to design such materials, we need to predict both crystal structure and the resulting physical properties, and this is expensive using traditional computational methods. Machine-learned interatomic potential methods offer major accelerations here, but molecular crystal structure prediction remains challenging due to the weak intermolecular interactions that dictate crystal packing. Moreover, machine-learned interatomic potentials do not accelerate the prediction of all physical properties for molecular crystals. Here we present Molecular Crystal Representation from Transformers (MCRT), a transformer-based model for molecular crystal property prediction that is pre-trained on 706,126 experimental crystal structures extracted from the Cambridge Structural Database (CSD). MCRT employs four different pre-training tasks to extract both local and global representations from the crystals using multi-modal features to encode crystal structure and geometry. MCRT has the potential to serve as a universal foundation model for predicting a range of properties for molecular crystals, achieving state-of-the-art results even when fine-tuned on small-scale datasets. We demonstrate MCRT’s practical utility in both crystal property prediction and crystal structure prediction. We also show that model predictions can be interpreted by using attention scores.
1478-6524
Feng, Minggao
0ed7aabb-d130-4ad9-8696-95213100dbe7
Zhao, Chengxi
e5b1b9a5-9b93-4e5b-acbe-a8e8f3a0c3e9
Day, Graeme M.
e3be79ba-ad12-4461-b735-74d5c4355636
Evangelopoulos, Xenophon
6827457e-c87b-44d7-8283-66099af12d6f
Cooper, Andrew I
8cad6e52-32d3-487b-98e3-3f01cec43553
Feng, Minggao
0ed7aabb-d130-4ad9-8696-95213100dbe7
Zhao, Chengxi
e5b1b9a5-9b93-4e5b-acbe-a8e8f3a0c3e9
Day, Graeme M.
e3be79ba-ad12-4461-b735-74d5c4355636
Evangelopoulos, Xenophon
6827457e-c87b-44d7-8283-66099af12d6f
Cooper, Andrew I
8cad6e52-32d3-487b-98e3-3f01cec43553

Feng, Minggao, Zhao, Chengxi, Day, Graeme M., Evangelopoulos, Xenophon and Cooper, Andrew I (2025) A universal foundation model for transfer learning in molecular crystals. Chemical Science. (doi:10.26434/chemrxiv-2024-gn2rv-v2).

Record type: Article

Abstract

The physical and chemical properties of molecular crystals are a combined function of molecular structure and the molecular crystal packing. Specific crystal packings can enable applications such as pharmaceuticals, organic electronics, and porous materials for gas storage. However, to design such materials, we need to predict both crystal structure and the resulting physical properties, and this is expensive using traditional computational methods. Machine-learned interatomic potential methods offer major accelerations here, but molecular crystal structure prediction remains challenging due to the weak intermolecular interactions that dictate crystal packing. Moreover, machine-learned interatomic potentials do not accelerate the prediction of all physical properties for molecular crystals. Here we present Molecular Crystal Representation from Transformers (MCRT), a transformer-based model for molecular crystal property prediction that is pre-trained on 706,126 experimental crystal structures extracted from the Cambridge Structural Database (CSD). MCRT employs four different pre-training tasks to extract both local and global representations from the crystals using multi-modal features to encode crystal structure and geometry. MCRT has the potential to serve as a universal foundation model for predicting a range of properties for molecular crystals, achieving state-of-the-art results even when fine-tuned on small-scale datasets. We demonstrate MCRT’s practical utility in both crystal property prediction and crystal structure prediction. We also show that model predictions can be interpreted by using attention scores.

Text
a-universal-foundation-model-for-transfer-learning-in-molecular-crystals - Author's Original
Available under License Creative Commons Attribution.
Download (37MB)
Text
MCRT___A_Universal_Foundation_Model_for_Transfer_Learning_in_Molecular_Crystals - Accepted Manuscript
Download (59MB)

More information

Accepted/In Press date: 5 May 2025
e-pub ahead of print date: 21 May 2025

Identifiers

Local EPrints ID: 502297
URI: http://eprints.soton.ac.uk/id/eprint/502297
ISSN: 1478-6524
PURE UUID: 49c9fd16-59c7-439b-91b7-91b7bc0a3a25
ORCID for Graeme M. Day: ORCID iD orcid.org/0000-0001-8396-2771

Catalogue record

Date deposited: 20 Jun 2025 16:39
Last modified: 22 Aug 2025 02:07

Export record

Altmetrics

Contributors

Author: Minggao Feng
Author: Chengxi Zhao
Author: Graeme M. Day ORCID iD
Author: Xenophon Evangelopoulos
Author: Andrew I Cooper

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×