READ ME File For 'Dataset title' Dataset DOI: https://doi.org/10.5258/SOTON/D3745 Date that the file was created: 02/2025 ------------------- GENERAL INFORMATION ------------------- ReadMe Author: JOSHUA DICKMAN, University of Southampton Date of data collection: 2024 - 2025 Related projects: ADAM (Autonomous Discovery of Advanced Materials, https://cordis.europa.eu/project/id/856405) -------------------------- SHARING/ACCESS INFORMATION -------------------------- Licenses/restrictions placed on the data, or limitations of reuse: Creative Commons Attribution -------------------- DATA & FILE OVERVIEW -------------------- This dataset contains: GA_database_pure.db GA_database_pure_README.txt -------------------------- METHODOLOGICAL INFORMATION -------------------------- Molecules were sampled using the Day Group's CSPy code base, with the MolBuilder genetic algorithm tool. The molecules sampled with these genetic algorithms are stored in a SQLite database, alongside calculated properties, additional information about the molecular structures, and metadata regarding their discovery within genetic algorithms. There are multiple tables within this database, which can be organised into groups: - Molecular structures are stored with unique SmilesID numbers, alongside both the SMILES and InChIKey representations, in the {Molecules} table. - Counts of each molecular building block type (rings and side groups) are stored in {BuildingBlocks}, and representations of the molecular 'shape' are stored in {Scaffolds}. - The SmilesIDs for each molecule in each of the 36 initial populations are stored in a reference table {InitPops} - Summaries of each genetic algorithm are stored within {GeneticAlgorithms}, where each run is assigned a unique JobID. - Post-hoc calculations are also summarised, under unique JobID values in the {ExtraCalculations} table. - Information regarding when molecules were sampled by genetic algorithms is found within {PopInfo}, where the unique SmilesID is stored with the JobID of the genetic algorithm, and the generation number (GenNumber) where it was found. - Calculated properties are stored in the three results tables: {SynthResults} for values related to synthetic difficulty, {ReorgResults} for calculated electron reorganisation energies, and {CspResults} for calculated electron mobilities, which were obtained through the use of predicted crystal structure landscapes. For each molecule, the first JobID with a valid property value is also stored (i.e. the first genetic algorithm where the property was calculated and used to bias fitness).