README file for "Data for 'The Monte Carlo threshold algorithm applied to crystal structure prediction landscapes of polycyclic aromatic hydrocarbons'" Dataset DOI: https://doi.org/10.5258/SOTON/D3710 This dataset supports the publication: The Monte Carlo threshold algorithm applied to crystal structure prediction landscapes of polycyclic aromatic hydrocarbons. Data ---- Data for each molecule is provided in separate .zip files (phenanthrene.zip, pyrene.zip, perylene.zip). Each folder contains a subfolder for each potential energy model used (fit, pahap, isopahap), and each of these subfolder has the following files: - csp.db: the CSP database, created using the mol-CSPy code (development version). The full database schema can be found here: https://mol-cspy.readthedocs.io/en/latest/sql_schema.html. The main table is the one called 'crystal'. This table has 6 columns: id, spacegroup, density, energy, molecule_id, file_content. 'id' is the unique identifier of the crystal structure, 'file_content' describes the crystal structure in the SHELXL format. The crystal structures in CIF format are provided in the csp_structures.cif file. - csp_structures.cif: A single CIF with all the structures in the CSP landscape, each structure is in a DATA block, with name the crystal 'id'. - csp_data.csv: A CSV formatted file with data for each crystal structure in the CSP database (energy, density, id). - disconn.gml: The disconnectivity graph from the Monte Carlo Threshold trajectories, saved in the GML format. Each node in the graph has some metadata. - packing_motif.csv: The packing motif of the crystal structures that were within 30 kJ/mol of the global energy minimum of the MCT trajectories. The first column of the file is the crystal ID (corresponds to the "label" entry of the graph nodes from the disconn.gml file), the second column is the packing motif (s: sandwich-herringbone, h: herringbone, g: gamma, b: beta).