Dos Santos Morado, Joao Pedro (2022) Methods for accurate and efficient simulation of the conformational landscape of ligands. University of Southampton, Doctoral Thesis, 323pp.
Abstract
Ligand modelling is an essential element of drug discovery. To accurately simulate chemical and physical phenomena, it is necessary to employ molecular models that provide reliable results in a timely fashion. The gold standard method in ligand modelling remains quantum mechanics (QM). Owing to the high computational cost of QM methods, their use in ab initio simulations is limited to all but the simplest systems. Molecular mechanics force fields (MM FFs) have also been around for decades. They stand as the cheapest alternative to QM methods, despite their widely-known accuracy limitations. A promising new alternative to FFs are the machine-learning (ML) potentials. ML potentials are molecular models based on artificial intelligence, seemingly more flexible and accurate than FFs, although more computationally costly. For a given FF functional form, the quality of the parameterisation is crucial and determines how accurately observable properties can be computed from simulations. Whilst accurate FF parameterisations are available for biomolecules, the parameterisation of novel drug candidates is particularly challenging, as these may involve functional groups and interactions for which accurate parameters are not available. To address the problem of FF accuracy, we developed ParaMol, software that has the capability of reparameterising class I FFs with a special focus on druglike molecules. We demonstrate that, within the constraints of a FF functional form, ParaMol can derive near-ideal FF parameters. Additionally, we illustrate the best practices to follow when employing specific parameterisation routes; the sensitivity of different fitting data sets, such as relaxed dihedral scans and configurational ensembles, to the parameterisation procedure; and the features of the various weighting methods available to weight configurations. Monte Carlo (MC) and molecular dynamics (MD) simulations can be performed using FFs, ML potentials, or QM methods. The higher the level of theory used in MD or MC simulations, the more reliable the structural information extracted from them will be, despite the increase in computational cost. To combine the accuracy of ab initio simulations with the efficiency of classical ones, we present a multilevel MC method that allows quantum configurational ensembles to be generated while keeping the computational cost at a minimum. We show that FF reparameterisation is an efficient route to generate FFs that reproduce QM results more closely, which in turn can be used as low-cost models to approach the gold standard QM accuracy. We demonstrate that the MC acceptance rate is strongly correlated with various phase space overlap measurements, constituting a robust metric to evaluate the similarity between any two levels of theory. As more advanced applications, we apply the nMC-MC algorithm to generate the QM/MM distribution of a ligand in aqueous solution and present a selfparameterising version of the method. Recently, ML potentials have emerged as an alternative to FFs. However, owing to their newness, there are many unanswered questions concerning their applicability that must be addressed. To this end, we present a comparative study that evaluates the performance of a ML potential, a traditional FF, and an optimally tuned FF in the modelling of a set of 10 γ-fluorohydrins that exhibit a complex interplay between intra- and intermolecular interactions in determining conformer stability. For this set of molecules, we benchmark the performance of each molecular model, evaluating their energetic, geometric, and sampling accuracy relative to quantum mechanical data, both in the gas phase and chloroform solution. We also assess the performance of the aforementioned molecular models in estimating J-coupling constants by comparing their predictions to experimental data available in chloroform. We then discuss and highlight the strengths and weaknesses of each model, providing guidelines for future development of FFs and ML potentials. The complexity and scope of the problems addressed in this thesis preclude complete or definitive solutions. Even so, we believe the outcomes of this work may have implications in different areas of chemistry and biology, especially for those interested in modelling the conformational landscape of small organic molecules. The overall conclusions of this thesis are: FFs can be reliably parameterised in an automated fashion using ParaMol; optimally tuned FFs can work as gateways to generate QM ensembles, at least for small molecules in the gas phase; despite the ability of ML potentials to reproduce their training data, the transferability of ML potentials to other domains is limited, and conventional FFs still play an important role in molecular simulations.
More information
Identifiers
Catalogue record
Export record
Contributors
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.