The University of Southampton
University of Southampton Institutional Repository

Data Mining and Decision Support in Pharmaceutical Databases

Data Mining and Decision Support in Pharmaceutical Databases
Data Mining and Decision Support in Pharmaceutical Databases
This thesis lies in the area of chemoinformatics, known as virtual screening (VS). VS describes a set of computational methods that provide a fast and cheap alternative to biological screening which involves the selection, synthesis and testing of molecules to ascertain their biological activity in a particular domain, e.g. pain relief, reduction of inflammation. This is important because reducing the cost and, crucially, time in the early stages of compound development can have a disproportionate benefit in profitability in a cycle that has a short patent lifetime. Machine learning methods are becoming popular in this domain but problems arise when 2D fingerprints are used as descriptors. Fingerprints are an extremely sparse, binary-valued representation of molecules. Furthermore, VS also suffers strongly from the so-called "small-sample-size" problem where the number of covariates is comparable to or exceeds the number of samples. These problems can be solved by developing machine learning algorithm which can handle very large sets of high-dimensional data. The high-dimensional data contains an unprecedented level of complexity, hence, some forms of complexity control are therefore necessary. Alternatively a suitable dimensional reduction method can be used. This thesis consists of four major works which are conducted with the MDL Drug Data Report (MDDR) database. The works are as follows: (i) Development of binary kernel discrimination (BKD). (ii) A new algorithm is introduced for kernel machine family, the so-call "parsimonious kernel fisher discrimination". The proposed algorithm is then applied to VS tasks. (iii) Prediction by posterior estimation in VS. (iv) A comparison of four variants of principal component analysis with potential in VS. The experiments show that, BKD in conjunction with Jaccard/Tanimoto is found to be the best method while other approaches are found to be less accurate than BKD but still comparable in a number of cases.
Pasupa, Kitsuchart
952ededb-8c97-41b7-a65b-6aba31de2669
Pasupa, Kitsuchart
952ededb-8c97-41b7-a65b-6aba31de2669

Pasupa, Kitsuchart (2007) Data Mining and Decision Support in Pharmaceutical Databases. University of Sheffield, Automatic Control & Systems Engineering, Doctoral Thesis.

Record type: Thesis (Doctoral)

Abstract

This thesis lies in the area of chemoinformatics, known as virtual screening (VS). VS describes a set of computational methods that provide a fast and cheap alternative to biological screening which involves the selection, synthesis and testing of molecules to ascertain their biological activity in a particular domain, e.g. pain relief, reduction of inflammation. This is important because reducing the cost and, crucially, time in the early stages of compound development can have a disproportionate benefit in profitability in a cycle that has a short patent lifetime. Machine learning methods are becoming popular in this domain but problems arise when 2D fingerprints are used as descriptors. Fingerprints are an extremely sparse, binary-valued representation of molecules. Furthermore, VS also suffers strongly from the so-called "small-sample-size" problem where the number of covariates is comparable to or exceeds the number of samples. These problems can be solved by developing machine learning algorithm which can handle very large sets of high-dimensional data. The high-dimensional data contains an unprecedented level of complexity, hence, some forms of complexity control are therefore necessary. Alternatively a suitable dimensional reduction method can be used. This thesis consists of four major works which are conducted with the MDL Drug Data Report (MDDR) database. The works are as follows: (i) Development of binary kernel discrimination (BKD). (ii) A new algorithm is introduced for kernel machine family, the so-call "parsimonious kernel fisher discrimination". The proposed algorithm is then applied to VS tasks. (iii) Prediction by posterior estimation in VS. (iv) A comparison of four variants of principal component analysis with potential in VS. The experiments show that, BKD in conjunction with Jaccard/Tanimoto is found to be the best method while other approaches are found to be less accurate than BKD but still comparable in a number of cases.

This record has no associated files available for download.

More information

Published date: 26 November 2007
Organisations: Electronics & Computer Science

Identifiers

Local EPrints ID: 266583
URI: http://eprints.soton.ac.uk/id/eprint/266583
PURE UUID: 2dcb8864-f084-4039-b755-63a4b32617dc

Catalogue record

Date deposited: 20 Aug 2008 11:51
Last modified: 10 Dec 2021 22:19

Export record

Contributors

Author: Kitsuchart Pasupa

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×