READ ME File For 'CSP-EA sampled molecules and crystal structures, with associated mobilities' DOI: Date that the file was created: July 2025 ------------------- GENERAL INFORMATION ------------------- ReadMe Author: Jay Johal, University of Southampton [ORCID: 0000-0001-8489-4803] Date of data collection: 2021-2024 Information about geographic location of data collection: University of Southampton, UK -------------------------- SHARING/ACCESS INFORMATION -------------------------- Licenses/restrictions placed on the data, or limitations of reuse: CC-BY Recommended citation for the data: This dataset supports the publication: AUTHORS: Jay Johal, Graeme M. Day TITLE: Exploring organic chemical space using crystal structure prediction informed evolutionary design JOURNAL: Nature Communications PAPER DOI IF KNOWN: -------------------- DATA & FILE OVERVIEW -------------------- This dataset contains: Evaluated_top31_molecules_reorg_and_mobility_values.xlsx - A summary of the 31 (21 from CSP-EA and 10 from Reorg_EA studies) top molecule's calculated reorganisation energies and mobilities at each level of sub sampling evaluated. - Molecules are named by their Molecule_Id, SMILES and InChiKey. - The molecule's found using each search setting are identified by their rankings. - For the mobilities GM refers to the mobility of the global minimum at that level of sampling, Avg refers to the landscape average of the low energy (7.2 kJ/mol) window at the level of sampling. Top31_sampled_molecules.zip, for each of the top 31 molecules evaluated: - A .cif file, named using the molecule's InChiKey, containing all the crystal structures in the low energy window (within 7.2 kJ/mol from the global energy minimum on the landscape) from the comprehensive CSP sampling - A .csv file, containing information on each of the above mentioned crystal structures. The headers and units are shown below: crystal id | spacegroup | density [g/cm^3] | energy [kJ/mol]| landscape averaged mobility (mobility) [cm^2 /Vs] | predicted packing motif (pred). Sub-sampling_benchmark_molecules_CSP_data.zip, - benchmark_molecule_sampling_runtime.csv, is a summary of the sampling evaluations performed. - For each of the 20 molecules used to benchmark the sub-sampling schemes costs: - A .cif file containing all the crystal structures in the low energy window (within 7.2 kJ/mol from the global energy minimum on the landscape) from the comprehensive CSP sampling - A .csv file, containing information on each of the above mentioned crystal structures. The headers and units are shown below: crystal id (id) | spacegroup | density [g/cm^3] | energy [kJ/mol] | minimization_step | quasi-random seed (trial_number) | minimization_time (s) | metadata Ini_files.zip - The final config (.ini) files for each CSP-EA search performed. Contained within are all the settings to set up the calculations as well as the molecules sampled within each generation. - A .csv file relating the InChiKey used for file naming to the Inchi string stored in the .ini files. Stored mobilities in the calculated_properties dictionary are in cm^2 /Vs. LowE_Mobilities.zip - Grouped by search setting (SG14-500, Top5-500, Top10-500, SamplingA and GM_Search), each unique molecule sampled in the aggregated CSP-EA repeats for that search setting has a .csv file containing the calculated mobilities of the individual crystals sampled of that molecule. - For the landscape averaged search settings all the individual crystals within the low energy (7.2 kJ/mol) window on which mobilities were evaluated are shown. - For the global minimum targeting searches only the global minimum crystal structures mobility is recorded. The headers and units in the .csv files are shown below: crystal id | mobility [cm^2 /Vs] CSP_data_*.zip - The CSP landscapes and structures for the molecules sampled in each EA, grouped by CSP sampling used (SG14-500, Top5-500, Top10-500, SamplingA and GM_Search). Due to the size of the dataset SamplingA and GM_Search have been split. - Related to the LowE_Mobilities.zip, again grouped by search setting for each unique molecule sampled in the aggregated CSP-EA repeats for that search setting: - A .cif file containing all the crystal structures in the low energy window (within 7.2 kJ/mol from the global energy minimum on the landscape) from the comprehensive CSP sampling or just the global minimum structure for the structures in the GM_Search - A .csv file, containing information on each of the above mentioned crystal structures. The headers and units are shown below: crystal id (id) | spacegroup | density [g/cm^3] | energy [kJ/mol] | minimization_step | quasi-random seed (trial_number) | minimization_time (s) | metadata AOM_sPOD_comparisons.zip, contains 2 sets of comparisons between the electronic coupling calculated using sPOD and AOM, as described in the related publication. - Aza_M1-M21_AOM_sPOD refers to dimers taken from crystals from the low energy (7.2 kJ/mol) window of the top molecules. - known_molecules_AOM_sPOD refers to dimers taken from the known experimental crystal structures evaluated as part of the benchmarking of the mobility calculations. Packing_motif_example_crystals.zip - BEDT-TTF.cif (CIZMON03) = Sandwich Herringbone packing - pentacene-I.cif (PENCEN) = Herringbone packing - M2.cif = Beta packing - M4.cif = Gamma packing -------------------------- METHODOLOGICAL INFORMATION -------------------------- Description of methods used for collection/generation of data: See accompanying publication. Methods for processing the data: Post-processed using in-house Python code to generate CIF files and CSV files of energy rankings.