Graph kernel extensions and experiments with application to molecule classification, lead hopping and multiple targets

Demco, Anthony A. (2009) Graph kernel extensions and experiments with application to molecule classification, lead hopping and multiple targets. University of Southampton, School of Electronics and Computer Science, Doctoral Thesis, 153pp.

Record type: Thesis (Doctoral)

Abstract

The discovery of drugs that can effectively treat disease and alleviate pain is one of the core challenges facing modern medicine. The tools and techniques of machine learning have perhaps the greatest potential to provide a fast and efficient route toward the fabrication of novel and effective drugs. In particular, modern structured kernel methods have been successfully applied to range of problem domains and have been recently adapted for graph structures making them directly applicable to pharmaceutical drug discovery. Specifically graph structures have a natural fit with molecular data, in that a graph consists of a set of nodes that represent atoms that are connected by bonds. In this thesis we use graph kernels that utilize three different graph representations: molecular, topological pharmacophore and reduced graphs. We introduce a set of novel graph kernels which are based on a measure of the number of finite walks within a graph. To calculate this measure we employ a dynamic programming framework which allows us to extend graph kernels so they can deal with non-tottering, softmatching and allows the inclusion of gaps. In addition we review several graph colouring methods and subsequently incorporate colour into our graph kernels models. These kernels are designed for molecule classification in general, although we show how they can be adapted to other areas in drug discovery. We conduct three sets of experiments and discuss how our augmented graph kernels are designed and adapted for these areas. First, we classify molecules based on their activity in comparison to a biological target. Second, we explore the related problem of lead hopping. Here one set of chemicals is used to predict another that is structurally dissimilar. We discuss the problems that arise due to the fact that some patterns are filtered from the dataset. By analyzing lead hopping we are able to go beyond the typical cross-validation approach and construct a dataset that more accurately reflect real-world tasks. Lastly, we explore methods of integrating information from multiple targets. We test our models as a multi-response problem and later introduce a new approach that employs Kernel Canonical Correlation Analysis (KCCA) to predict the best molecules for an unseen target. Overall, we show that graph kernels achieve good results in classification, lead hopping and multiple target experiments.

Text

AnthonyDemco-Thesis.pdf - Other

Download (1MB)

More information

Published date: February 2009

Organisations: University of Southampton

Identifiers

Local EPrints ID: 66209

URI: http://eprints.soton.ac.uk/id/eprint/66209

PURE UUID: d10deb75-d308-41c6-95e2-74ea94342f32

Catalogue record

Date deposited: 13 May 2009

Last modified: 13 Mar 2024 18:13

Export record

Share this record

Share this on Facebook Share this on Twitter Share this on Weibo

Contributors

Author: Anthony A. Demco

Thesis advisor: Craig Saunders

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Library staff additional information