The University of Southampton
University of Southampton Institutional Repository

Can machine learning predict the space group preference of organic molecules?

Can machine learning predict the space group preference of organic molecules?
Can machine learning predict the space group preference of organic molecules?
Crystal structure prediction (CSP) is a valuable computational technique used to anticipate the likely crystal structures of a compound of interest. These methods have been proven useful in research and development of pharmaceutical solid forms and in guiding the discovery of materials with targeted properties. Despite success of CSP in these areas, its widespread application remains limited by computational cost. One approach to reduce the computational cost of CSP is to limit the search space of generated crystal structures; it is common practice to limit the search to a selection of the most frequently-observed space groups, with the associated risk of excluding the space group of an observed crystal structure. As an attempt to reduce computational cost and ambiguity when choosing a set of space groups for CSP, we investigate the use of machine learning models to predict the most likely space group(s) of a given organic molecule. We find that both random forests and graph neural networks provide accuracies far above random, and better than what is achieved by selecting based on the overall space group frequencies observed for organic molecular crystals. The best model, using a graph neural network, achieves a maximum accuracy of 47.2% for single (top-1) space group prediction, which is an improvement of 8.2% above the reference. This model was trained with 3-dimensional molecular information, which improved accuracies compared to a model trained with only 2-dimensional bonding information. Furthermore, we found that random forest models performed best when both chemical and geometric molecular features are included in training, which indicates that both are important in defining a molecule’s preferred space groups.
1528-7483
Gittins, Hannah
41cf661b-3625-4692-a208-50f9da42a5b8
Day, Graeme M.
e3be79ba-ad12-4461-b735-74d5c4355636
Gittins, Hannah
41cf661b-3625-4692-a208-50f9da42a5b8
Day, Graeme M.
e3be79ba-ad12-4461-b735-74d5c4355636

Gittins, Hannah and Day, Graeme M. (2026) Can machine learning predict the space group preference of organic molecules? Crystal Growth & Design. (In Press)

Record type: Article

Abstract

Crystal structure prediction (CSP) is a valuable computational technique used to anticipate the likely crystal structures of a compound of interest. These methods have been proven useful in research and development of pharmaceutical solid forms and in guiding the discovery of materials with targeted properties. Despite success of CSP in these areas, its widespread application remains limited by computational cost. One approach to reduce the computational cost of CSP is to limit the search space of generated crystal structures; it is common practice to limit the search to a selection of the most frequently-observed space groups, with the associated risk of excluding the space group of an observed crystal structure. As an attempt to reduce computational cost and ambiguity when choosing a set of space groups for CSP, we investigate the use of machine learning models to predict the most likely space group(s) of a given organic molecule. We find that both random forests and graph neural networks provide accuracies far above random, and better than what is achieved by selecting based on the overall space group frequencies observed for organic molecular crystals. The best model, using a graph neural network, achieves a maximum accuracy of 47.2% for single (top-1) space group prediction, which is an improvement of 8.2% above the reference. This model was trained with 3-dimensional molecular information, which improved accuracies compared to a model trained with only 2-dimensional bonding information. Furthermore, we found that random forest models performed best when both chemical and geometric molecular features are included in training, which indicates that both are important in defining a molecule’s preferred space groups.

Text
Can_machine_learning_predict_the_space_group_preference_of_organic_molecules - Accepted Manuscript
Restricted to Repository staff only until 7 April 2027.
Request a copy
Text
SI for Can_machine_learning_predict_the_space_group_preference_of_organic_molecules
Restricted to Repository staff only
Request a copy

More information

Accepted/In Press date: 7 April 2026

Identifiers

Local EPrints ID: 511531
URI: http://eprints.soton.ac.uk/id/eprint/511531
ISSN: 1528-7483
PURE UUID: 84a9ace2-83ba-486f-9064-bf5e52d8f093
ORCID for Hannah Gittins: ORCID iD orcid.org/0009-0003-1032-2871
ORCID for Graeme M. Day: ORCID iD orcid.org/0000-0001-8396-2771

Catalogue record

Date deposited: 19 May 2026 16:31
Last modified: 21 May 2026 02:09

Export record

Contributors

Author: Hannah Gittins ORCID iD
Author: Graeme M. Day ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×