Graph kernel extensions and experiments with application to
molecule classification, lead hopping and multiple targets
Graph kernel extensions and experiments with application to
molecule classification, lead hopping and multiple targets
The discovery of drugs that can effectively treat disease and alleviate pain is one of the core challenges facing modern medicine. The tools and techniques of machine learning have perhaps the greatest potential to provide a fast and efficient route toward the fabrication of novel and effective drugs. In particular, modern structured kernel methods have been successfully applied to range of problem domains and have been recently adapted for graph structures making them directly applicable to pharmaceutical drug discovery. Specifically graph structures have a natural fit with molecular data, in that a graph consists of a set of nodes that represent atoms that are connected by bonds. In this thesis we use graph kernels that utilize three different graph representations: molecular, topological pharmacophore and reduced graphs. We introduce a set of novel graph kernels which are based on a measure of the number of finite walks within a graph. To calculate this measure we employ a dynamic programming framework which allows us to extend graph kernels so they can deal with non-tottering, softmatching and allows the inclusion of gaps. In addition we review several graph colouring methods and subsequently incorporate colour into our graph kernels models. These kernels are designed for molecule classification in general, although we show how they can be adapted to other areas in drug discovery. We conduct three sets of experiments and discuss how our augmented graph kernels are designed and adapted for these areas. First, we classify molecules based on their activity in comparison to a biological target. Second, we explore the related problem of lead hopping. Here one set of chemicals is used to predict another that is structurally dissimilar. We discuss the problems that arise due to the fact that some patterns are filtered from the dataset. By analyzing lead hopping we are able to go beyond the typical cross-validation approach and construct a dataset that more accurately reflect real-world tasks. Lastly, we explore methods of integrating information from multiple targets. We test our models as a multi-response problem and later introduce a new approach that employs Kernel Canonical Correlation Analysis (KCCA) to predict the best molecules for an unseen target. Overall, we show that graph kernels achieve good results in classification, lead hopping and multiple target experiments.
Demco, Anthony A.
ca1fc24f-cbb2-4e3e-b280-a1f8e4412fb2
February 2009
Demco, Anthony A.
ca1fc24f-cbb2-4e3e-b280-a1f8e4412fb2
Saunders, Craig
26634635-4d4d-4469-b9ec-1d68788aa47a
Demco, Anthony A.
(2009)
Graph kernel extensions and experiments with application to
molecule classification, lead hopping and multiple targets.
University of Southampton, School of Electronics and Computer Science, Doctoral Thesis, 153pp.
Record type:
Thesis
(Doctoral)
Abstract
The discovery of drugs that can effectively treat disease and alleviate pain is one of the core challenges facing modern medicine. The tools and techniques of machine learning have perhaps the greatest potential to provide a fast and efficient route toward the fabrication of novel and effective drugs. In particular, modern structured kernel methods have been successfully applied to range of problem domains and have been recently adapted for graph structures making them directly applicable to pharmaceutical drug discovery. Specifically graph structures have a natural fit with molecular data, in that a graph consists of a set of nodes that represent atoms that are connected by bonds. In this thesis we use graph kernels that utilize three different graph representations: molecular, topological pharmacophore and reduced graphs. We introduce a set of novel graph kernels which are based on a measure of the number of finite walks within a graph. To calculate this measure we employ a dynamic programming framework which allows us to extend graph kernels so they can deal with non-tottering, softmatching and allows the inclusion of gaps. In addition we review several graph colouring methods and subsequently incorporate colour into our graph kernels models. These kernels are designed for molecule classification in general, although we show how they can be adapted to other areas in drug discovery. We conduct three sets of experiments and discuss how our augmented graph kernels are designed and adapted for these areas. First, we classify molecules based on their activity in comparison to a biological target. Second, we explore the related problem of lead hopping. Here one set of chemicals is used to predict another that is structurally dissimilar. We discuss the problems that arise due to the fact that some patterns are filtered from the dataset. By analyzing lead hopping we are able to go beyond the typical cross-validation approach and construct a dataset that more accurately reflect real-world tasks. Lastly, we explore methods of integrating information from multiple targets. We test our models as a multi-response problem and later introduce a new approach that employs Kernel Canonical Correlation Analysis (KCCA) to predict the best molecules for an unseen target. Overall, we show that graph kernels achieve good results in classification, lead hopping and multiple target experiments.
Text
AnthonyDemco-Thesis.pdf
- Other
More information
Published date: February 2009
Organisations:
University of Southampton
Identifiers
Local EPrints ID: 66209
URI: http://eprints.soton.ac.uk/id/eprint/66209
PURE UUID: d10deb75-d308-41c6-95e2-74ea94342f32
Catalogue record
Date deposited: 13 May 2009
Last modified: 13 Mar 2024 18:13
Export record
Contributors
Author:
Anthony A. Demco
Thesis advisor:
Craig Saunders
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics