Can machine learning predict the space group preference of organic molecules?
Can machine learning predict the space group preference of organic molecules?
Crystal structure prediction (CSP) is a valuable computational technique used to anticipate the likely crystal structures of a compound of interest. These methods have been proven useful in research and development of pharmaceutical solid forms and in guiding the discovery of materials with targeted properties. Despite success of CSP in these areas, its widespread application remains limited by computational cost. One approach to reduce the computational cost of CSP is to limit the search space of generated crystal structures; it is common practice to limit the search to a selection of the most frequently-observed space groups, with the associated risk of excluding the space group of an observed crystal structure. As an attempt to reduce computational cost and ambiguity when choosing a set of space groups for CSP, we investigate the use of machine learning models to predict the most likely space group(s) of a given organic molecule. We find that both random forests and graph neural networks provide accuracies far above random, and better than what is achieved by selecting based on the overall space group frequencies observed for organic molecular crystals. The best model, using a graph neural network, achieves a maximum accuracy of 47.2% for single (top-1) space group prediction, which is an improvement of 8.2% above the reference. This model was trained with 3-dimensional molecular information, which improved accuracies compared to a model trained with only 2-dimensional bonding information. Furthermore, we found that random forest models performed best when both chemical and geometric molecular features are included in training, which indicates that both are important in defining a molecule’s preferred space groups.
Gittins, Hannah
41cf661b-3625-4692-a208-50f9da42a5b8
Day, Graeme M.
e3be79ba-ad12-4461-b735-74d5c4355636
Gittins, Hannah
41cf661b-3625-4692-a208-50f9da42a5b8
Day, Graeme M.
e3be79ba-ad12-4461-b735-74d5c4355636
Gittins, Hannah and Day, Graeme M.
(2026)
Can machine learning predict the space group preference of organic molecules?
Crystal Growth & Design.
(In Press)
Abstract
Crystal structure prediction (CSP) is a valuable computational technique used to anticipate the likely crystal structures of a compound of interest. These methods have been proven useful in research and development of pharmaceutical solid forms and in guiding the discovery of materials with targeted properties. Despite success of CSP in these areas, its widespread application remains limited by computational cost. One approach to reduce the computational cost of CSP is to limit the search space of generated crystal structures; it is common practice to limit the search to a selection of the most frequently-observed space groups, with the associated risk of excluding the space group of an observed crystal structure. As an attempt to reduce computational cost and ambiguity when choosing a set of space groups for CSP, we investigate the use of machine learning models to predict the most likely space group(s) of a given organic molecule. We find that both random forests and graph neural networks provide accuracies far above random, and better than what is achieved by selecting based on the overall space group frequencies observed for organic molecular crystals. The best model, using a graph neural network, achieves a maximum accuracy of 47.2% for single (top-1) space group prediction, which is an improvement of 8.2% above the reference. This model was trained with 3-dimensional molecular information, which improved accuracies compared to a model trained with only 2-dimensional bonding information. Furthermore, we found that random forest models performed best when both chemical and geometric molecular features are included in training, which indicates that both are important in defining a molecule’s preferred space groups.
Text
Can_machine_learning_predict_the_space_group_preference_of_organic_molecules
- Accepted Manuscript
Restricted to Repository staff only until 7 April 2027.
Request a copy
Text
SI for Can_machine_learning_predict_the_space_group_preference_of_organic_molecules
Restricted to Repository staff only
Request a copy
More information
Accepted/In Press date: 7 April 2026
Identifiers
Local EPrints ID: 511531
URI: http://eprints.soton.ac.uk/id/eprint/511531
ISSN: 1528-7483
PURE UUID: 84a9ace2-83ba-486f-9064-bf5e52d8f093
Catalogue record
Date deposited: 19 May 2026 16:31
Last modified: 21 May 2026 02:09
Export record
Contributors
Author:
Hannah Gittins
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics