The University of Southampton
University of Southampton Institutional Repository

Machine learning methods for analysis of organic molecular crystal structure prediction landscapes

Machine learning methods for analysis of organic molecular crystal structure prediction landscapes
Machine learning methods for analysis of organic molecular crystal structure prediction landscapes
This thesis presents work on the analysis of the Crystal Structure Prediction (CSP) landscapes of organic molecules. The work presented here adapted an existing approach to identifying stabilisable crystal structures from prediction sets - the Generalised Convex Hull (GCH) [1] - such that its application to molecular crystal structures was more theoretically reasonable. A new global Smooth Overlap of Atomic Positions (SOAP) kernel to reasonably define the similarity of molecular crystal structures was developed and then used within the GCH approach - which identifies
stabilisable crystal structure candidates by using unsupervised machine learning [1]. The results were compared to those from a GCH approach that utilised a simple average SOAP kernel to assess the impact of kernel construction. The new kernel was assessed regarding three key metrics useful to materials discovery: the effectiveness of the GCH in identifying stabilisable candidates, the interpretability of machine learned (ML) descriptors derived from the kernel, and the utility of the kernel in machine learning of energies. Comparisons revealed a complex picture of results -
from which a clearly superior kernel for identifying stabilisable structures could not be identified. However, the new kernel construction showed potential promise, particularly in leading to interpretable ML descriptors. Findings highlighted a sensitivity of similarity kernel based landscape analysis methods to kernel construction.

A secondary project developed a proof of concept for performing fast and approximate molecular CSP by formation and optimisation of structural analogues of previously predicted crystal structures of similar molecules. Preliminary results indicated strong potential of the method in predicting the most crucial regions of the CSP landscape with greatly reduced sampling relative to quasi-random CSP approaches. This suggested that the concept is a promising area for further
development – with some key areas for improvement being highlighted.

[1]-A. Anelli, E. A. Engel, C. J. Pickard and M. Ceriotti, Phys. Rev. Mater., 2018, 2, 103804
University of Southampton
Martin, Jennifer
979d288f-9864-4c69-aad6-226a9ad70ca0
Martin, Jennifer
979d288f-9864-4c69-aad6-226a9ad70ca0
Day, Graeme
e3be79ba-ad12-4461-b735-74d5c4355636
Coles, Simon
3116f58b-c30c-48cf-bdd5-397d1c1fecf8

Martin, Jennifer (2025) Machine learning methods for analysis of organic molecular crystal structure prediction landscapes. University of Southampton, Doctoral Thesis, 331pp.

Record type: Thesis (Doctoral)

Abstract

This thesis presents work on the analysis of the Crystal Structure Prediction (CSP) landscapes of organic molecules. The work presented here adapted an existing approach to identifying stabilisable crystal structures from prediction sets - the Generalised Convex Hull (GCH) [1] - such that its application to molecular crystal structures was more theoretically reasonable. A new global Smooth Overlap of Atomic Positions (SOAP) kernel to reasonably define the similarity of molecular crystal structures was developed and then used within the GCH approach - which identifies
stabilisable crystal structure candidates by using unsupervised machine learning [1]. The results were compared to those from a GCH approach that utilised a simple average SOAP kernel to assess the impact of kernel construction. The new kernel was assessed regarding three key metrics useful to materials discovery: the effectiveness of the GCH in identifying stabilisable candidates, the interpretability of machine learned (ML) descriptors derived from the kernel, and the utility of the kernel in machine learning of energies. Comparisons revealed a complex picture of results -
from which a clearly superior kernel for identifying stabilisable structures could not be identified. However, the new kernel construction showed potential promise, particularly in leading to interpretable ML descriptors. Findings highlighted a sensitivity of similarity kernel based landscape analysis methods to kernel construction.

A secondary project developed a proof of concept for performing fast and approximate molecular CSP by formation and optimisation of structural analogues of previously predicted crystal structures of similar molecules. Preliminary results indicated strong potential of the method in predicting the most crucial regions of the CSP landscape with greatly reduced sampling relative to quasi-random CSP approaches. This suggested that the concept is a promising area for further
development – with some key areas for improvement being highlighted.

[1]-A. Anelli, E. A. Engel, C. J. Pickard and M. Ceriotti, Phys. Rev. Mater., 2018, 2, 103804

Text
JMartin_Final_Thesis - Accepted Manuscript
Available under License University of Southampton Thesis Licence.
Download (29MB)
Text
Final-thesis-submission-Examination-Miss-Jennifer-Martin
Restricted to Repository staff only

More information

Published date: 2025

Identifiers

Local EPrints ID: 505997
URI: http://eprints.soton.ac.uk/id/eprint/505997
PURE UUID: 7f6f0fcd-42e9-44d9-a6f5-f3c29cc99f03
ORCID for Jennifer Martin: ORCID iD orcid.org/0009-0004-0343-6309
ORCID for Graeme Day: ORCID iD orcid.org/0000-0001-8396-2771
ORCID for Simon Coles: ORCID iD orcid.org/0000-0001-8414-9272

Catalogue record

Date deposited: 27 Oct 2025 17:47
Last modified: 28 Nov 2025 03:09

Export record

Contributors

Author: Jennifer Martin ORCID iD
Thesis advisor: Graeme Day ORCID iD
Thesis advisor: Simon Coles ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×