Data associated with "Can machine learning predict the space group preference of organic molecules?" article by H. Gittins and G. M. Day (G.M.Day@soton.ac.uk). Associated with the University of Southampton, UK. Dataset DOI: https://doi.org/10.5258/SOTON/D3912 Date of data collection: Oct 2022 - Sep 2025 Related projects: This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 856405). The authors acknowledges the use of the IRIDIS High Performance Computing Facility and associated support services at the University of Southampton in the completion of this work. This dataset contains: Relevant data to replicate the models shown in the article as well as the models: - RF_models and GNN_models contains the random forest (RF) and graph neural network (GNN) models shown the article. - RF_datasets and GNN_datasets contains the data used to train the models. The scripts to train these models can be found here: https://github.com/hannahgittins/SG_ML_prediction/tree/main