READ ME File For 'Computational data for: "Machine Learned Potentials by Active Learning from Organic Crystal Structure Prediction Landscapes"' Dataset DOI: 10.5258/SOTON/D2840 Date that the file was created: October, 2023 ------------------- GENERAL INFORMATION ------------------- ReadMe Author: Patrick Butler, University of Southampton This dataset supports the publication: AUTHORS:Butler, Patrick W. V., Hafizi, Roohollah and Day, Graeme M. TITLE:Machine Learned Potentials by Active Learning from Organic Crystal Structure Prediction Landscapes JOURNAL: PAPER DOI IF KNOWN: -------------------------- SHARING/ACCESS INFORMATION -------------------------- Licenses/restrictions placed on the data, or limitations of reuse: CC-BY -------------------- DATA & FILE OVERVIEW -------------------- Computational data for "Reducing overprediction of molecular crystals via threshold clustering" This repository contains the landscapes described in the article. For each system the structures before and after threshold clustering are given. Additionally csv files are provided that contain the calculated energy and density for all structures found and for the basin minima identified from threshold clustering. The dataset contains: Oxalic_Acid_landscape.zip committee_NNP: contains weights for each of the 5 folds in the 5 cross validation using parameters listed in entry 6 of Table 1 in the manuscript. Also includes the training sets and NNP settings. csp: contains the structures and data predicted for oxalic acid lattice energy minimized with the FIT+DMA model Oxalic_Acid_on_the_fly.zip 1_cNNP_before_OTF: contains weights for the 6 member committee neural network potential trained on the predicted oxalic acid landscape before on-the-fly training. Includes the training set and NNP settings. 2_cNNP_after_OTF: contains weights for the 6 member committee neural network potential trained on the predicted oxalic acid landscape after on-the-fly training. Includes the training set and NNP settings. fit+dma_trajectories: contains the structures sampled from the FIT+DMA MC trajectories initiated from the alpha and beta oxalic acid polymorphs and the energies estimated by the FIT+DMA, the cNNP before on-the-fly training, and the cNNP after on-the-fly training. Oxalic_acid_top300: contains the 300 lowest energy structures from the FIT+DMA CSP of oxalic acid and the 10 structures sample from this set by farthest point sampling. Resorcinol_landscape.zip committee_NNP: contains weights for the 18 member committee neural network potential used to correct the initial CSP landscape. Includes the training set and NNP settings. csp: contains the structures and data predicted for resorcinol energy minimized at with DFTB-D3 predictions: The predicted corrections for the low level CSP landscape to PBE-D3 level TTBI_landscape.zip committee_NNP: contains weights for the 18 member committee neural network potential used to correct the initial CSP landscape. Includes the training set and NNP settings. csp: contains the structures and data predicted for TTBI lattice energy minimized with the FIT+DMA model predictions: The predicted corrections for the low level CSP landscape to PBE-D3 level figures.zip contains jupyter notebooks for generating the figures in the main text