The University of Southampton
University of Southampton Institutional Repository

Improved understanding of aqueous solubility modeling through topological data analysis

Improved understanding of aqueous solubility modeling through topological data analysis
Improved understanding of aqueous solubility modeling through topological data analysis
Topological data analysis is a family of recent mathematical techniques seeking to understand the ‘shape’ of data, and has been used to understand the structure of the descriptor space produced from a standard chemical informatics software from the point of view of solubility. We have used the mapper algorithm, a TDA method that creates low-dimensional representations of data, to create a network visualization of the solubility space. While descriptors with clear chemical implications are prominent features in this space, reflecting their importance to the chemical properties, an unexpected and interesting correlation between chlorine content and rings and their implication for solubility prediction is revealed. A parallel representation of the chemical space was generated using persistent homology applied to molecular graphs. Links between this chemical space and the descriptor space were shown to be in agreement with chemical heuristics. The use of persistent homology on molecular graphs, extended by the use of norms on the associated persistence landscapes allow the conversion of discrete shape descriptors to continuous ones, and a perspective of the application of these descriptors to quantitative structure property relations is presented.
1758-2946
Pirashvili, Mariam
74a0b0b2-acbd-4ee2-9825-d8418ef74b5d
Brodzki, Jacek
b1fe25fd-5451-4fd0-b24b-c59b75710543
Belchi guillamon, Francisco
41c7c5e5-b259-45d8-89f9-7b7937517c53
Niranjan, Mahesan
5cbaeea8-7288-4b55-a89c-c43d212ddd4f
Frey, Jeremy G.
ba60c559-c4af-44f1-87e6-ce69819bf23f
Steinberg, Lee
283f7d74-c02e-4f52-a59e-396b12239e02
Pirashvili, Mariam
74a0b0b2-acbd-4ee2-9825-d8418ef74b5d
Brodzki, Jacek
b1fe25fd-5451-4fd0-b24b-c59b75710543
Belchi guillamon, Francisco
41c7c5e5-b259-45d8-89f9-7b7937517c53
Niranjan, Mahesan
5cbaeea8-7288-4b55-a89c-c43d212ddd4f
Frey, Jeremy G.
ba60c559-c4af-44f1-87e6-ce69819bf23f
Steinberg, Lee
283f7d74-c02e-4f52-a59e-396b12239e02

Pirashvili, Mariam, Brodzki, Jacek, Belchi guillamon, Francisco, Niranjan, Mahesan, Frey, Jeremy G. and Steinberg, Lee (2018) Improved understanding of aqueous solubility modeling through topological data analysis. Journal of Cheminformatics, 10, [54]. (doi:10.1186/s13321-018-0308-5).

Record type: Article

Abstract

Topological data analysis is a family of recent mathematical techniques seeking to understand the ‘shape’ of data, and has been used to understand the structure of the descriptor space produced from a standard chemical informatics software from the point of view of solubility. We have used the mapper algorithm, a TDA method that creates low-dimensional representations of data, to create a network visualization of the solubility space. While descriptors with clear chemical implications are prominent features in this space, reflecting their importance to the chemical properties, an unexpected and interesting correlation between chlorine content and rings and their implication for solubility prediction is revealed. A parallel representation of the chemical space was generated using persistent homology applied to molecular graphs. Links between this chemical space and the descriptor space were shown to be in agreement with chemical heuristics. The use of persistent homology on molecular graphs, extended by the use of norms on the associated persistence landscapes allow the conversion of discrete shape descriptors to continuous ones, and a perspective of the application of these descriptors to quantitative structure property relations is presented.

Text
s13321-018-0308-5 - Version of Record
Available under License Creative Commons Attribution.
Download (2MB)

More information

Accepted/In Press date: 8 November 2018
e-pub ahead of print date: 20 November 2018
Published date: 28 November 2018

Identifiers

Local EPrints ID: 426235
URI: http://eprints.soton.ac.uk/id/eprint/426235
ISSN: 1758-2946
PURE UUID: 78f4a49e-29a2-4132-b7b3-b177316745f6
ORCID for Jacek Brodzki: ORCID iD orcid.org/0000-0002-4524-1081
ORCID for Mahesan Niranjan: ORCID iD orcid.org/0000-0001-7021-140X
ORCID for Jeremy G. Frey: ORCID iD orcid.org/0000-0003-0842-4302

Catalogue record

Date deposited: 20 Nov 2018 17:30
Last modified: 16 Mar 2024 03:55

Export record

Altmetrics

Contributors

Author: Mariam Pirashvili
Author: Jacek Brodzki ORCID iD
Author: Francisco Belchi guillamon
Author: Mahesan Niranjan ORCID iD
Author: Jeremy G. Frey ORCID iD
Author: Lee Steinberg

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×