Machine learning methods for analysis of organic molecular crystal structure prediction landscapes
Machine learning methods for analysis of organic molecular crystal structure prediction landscapes
This thesis presents work on the analysis of the Crystal Structure Prediction (CSP) landscapes of organic molecules. The work presented here adapted an existing approach to identifying stabilisable crystal structures from prediction sets - the Generalised Convex Hull (GCH) [1] - such that its application to molecular crystal structures was more theoretically reasonable. A new global Smooth Overlap of Atomic Positions (SOAP) kernel to reasonably define the similarity of molecular crystal structures was developed and then used within the GCH approach - which identifies
stabilisable crystal structure candidates by using unsupervised machine learning [1]. The results were compared to those from a GCH approach that utilised a simple average SOAP kernel to assess the impact of kernel construction. The new kernel was assessed regarding three key metrics useful to materials discovery: the effectiveness of the GCH in identifying stabilisable candidates, the interpretability of machine learned (ML) descriptors derived from the kernel, and the utility of the kernel in machine learning of energies. Comparisons revealed a complex picture of results -
from which a clearly superior kernel for identifying stabilisable structures could not be identified. However, the new kernel construction showed potential promise, particularly in leading to interpretable ML descriptors. Findings highlighted a sensitivity of similarity kernel based landscape analysis methods to kernel construction.
A secondary project developed a proof of concept for performing fast and approximate molecular CSP by formation and optimisation of structural analogues of previously predicted crystal structures of similar molecules. Preliminary results indicated strong potential of the method in predicting the most crucial regions of the CSP landscape with greatly reduced sampling relative to quasi-random CSP approaches. This suggested that the concept is a promising area for further
development – with some key areas for improvement being highlighted.
[1]-A. Anelli, E. A. Engel, C. J. Pickard and M. Ceriotti, Phys. Rev. Mater., 2018, 2, 103804
University of Southampton
Martin, Jennifer
979d288f-9864-4c69-aad6-226a9ad70ca0
2025
Martin, Jennifer
979d288f-9864-4c69-aad6-226a9ad70ca0
Day, Graeme
e3be79ba-ad12-4461-b735-74d5c4355636
Coles, Simon
3116f58b-c30c-48cf-bdd5-397d1c1fecf8
Martin, Jennifer
(2025)
Machine learning methods for analysis of organic molecular crystal structure prediction landscapes.
University of Southampton, Doctoral Thesis, 331pp.
Record type:
Thesis
(Doctoral)
Abstract
This thesis presents work on the analysis of the Crystal Structure Prediction (CSP) landscapes of organic molecules. The work presented here adapted an existing approach to identifying stabilisable crystal structures from prediction sets - the Generalised Convex Hull (GCH) [1] - such that its application to molecular crystal structures was more theoretically reasonable. A new global Smooth Overlap of Atomic Positions (SOAP) kernel to reasonably define the similarity of molecular crystal structures was developed and then used within the GCH approach - which identifies
stabilisable crystal structure candidates by using unsupervised machine learning [1]. The results were compared to those from a GCH approach that utilised a simple average SOAP kernel to assess the impact of kernel construction. The new kernel was assessed regarding three key metrics useful to materials discovery: the effectiveness of the GCH in identifying stabilisable candidates, the interpretability of machine learned (ML) descriptors derived from the kernel, and the utility of the kernel in machine learning of energies. Comparisons revealed a complex picture of results -
from which a clearly superior kernel for identifying stabilisable structures could not be identified. However, the new kernel construction showed potential promise, particularly in leading to interpretable ML descriptors. Findings highlighted a sensitivity of similarity kernel based landscape analysis methods to kernel construction.
A secondary project developed a proof of concept for performing fast and approximate molecular CSP by formation and optimisation of structural analogues of previously predicted crystal structures of similar molecules. Preliminary results indicated strong potential of the method in predicting the most crucial regions of the CSP landscape with greatly reduced sampling relative to quasi-random CSP approaches. This suggested that the concept is a promising area for further
development – with some key areas for improvement being highlighted.
[1]-A. Anelli, E. A. Engel, C. J. Pickard and M. Ceriotti, Phys. Rev. Mater., 2018, 2, 103804
Text
JMartin_Final_Thesis
- Accepted Manuscript
Text
Final-thesis-submission-Examination-Miss-Jennifer-Martin
Restricted to Repository staff only
More information
Published date: 2025
Identifiers
Local EPrints ID: 505997
URI: http://eprints.soton.ac.uk/id/eprint/505997
PURE UUID: 7f6f0fcd-42e9-44d9-a6f5-f3c29cc99f03
Catalogue record
Date deposited: 27 Oct 2025 17:47
Last modified: 28 Nov 2025 03:09
Export record
Contributors
Author:
Jennifer Martin
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics