Data Mining and Decision Support in Pharmaceutical Databases
Data Mining and Decision Support in Pharmaceutical Databases
This thesis lies in the area of chemoinformatics, known as virtual screening (VS). VS describes a set of computational methods that provide a fast and cheap alternative to biological screening which involves the selection, synthesis and testing of molecules to ascertain their biological activity in a particular domain, e.g. pain relief, reduction of inflammation. This is important because reducing the cost and, crucially, time in the early stages of compound development can have a disproportionate benefit in profitability in a cycle that has a short patent lifetime. Machine learning methods are becoming popular in this domain but problems arise when 2D fingerprints are used as descriptors. Fingerprints are an extremely sparse, binary-valued representation of molecules. Furthermore, VS also suffers strongly from the so-called "small-sample-size" problem where the number of covariates is comparable to or exceeds the number of samples. These problems can be solved by developing machine learning algorithm which can handle very large sets of high-dimensional data. The high-dimensional data contains an unprecedented level of complexity, hence, some forms of complexity control are therefore necessary. Alternatively a suitable dimensional reduction method can be used. This thesis consists of four major works which are conducted with the MDL Drug Data Report (MDDR) database. The works are as follows: (i) Development of binary kernel discrimination (BKD). (ii) A new algorithm is introduced for kernel machine family, the so-call "parsimonious kernel fisher discrimination". The proposed algorithm is then applied to VS tasks. (iii) Prediction by posterior estimation in VS. (iv) A comparison of four variants of principal component analysis with potential in VS. The experiments show that, BKD in conjunction with Jaccard/Tanimoto is found to be the best method while other approaches are found to be less accurate than BKD but still comparable in a number of cases.
Pasupa, Kitsuchart
952ededb-8c97-41b7-a65b-6aba31de2669
26 November 2007
Pasupa, Kitsuchart
952ededb-8c97-41b7-a65b-6aba31de2669
Pasupa, Kitsuchart
(2007)
Data Mining and Decision Support in Pharmaceutical Databases.
University of Sheffield, Automatic Control & Systems Engineering, Doctoral Thesis.
Record type:
Thesis
(Doctoral)
Abstract
This thesis lies in the area of chemoinformatics, known as virtual screening (VS). VS describes a set of computational methods that provide a fast and cheap alternative to biological screening which involves the selection, synthesis and testing of molecules to ascertain their biological activity in a particular domain, e.g. pain relief, reduction of inflammation. This is important because reducing the cost and, crucially, time in the early stages of compound development can have a disproportionate benefit in profitability in a cycle that has a short patent lifetime. Machine learning methods are becoming popular in this domain but problems arise when 2D fingerprints are used as descriptors. Fingerprints are an extremely sparse, binary-valued representation of molecules. Furthermore, VS also suffers strongly from the so-called "small-sample-size" problem where the number of covariates is comparable to or exceeds the number of samples. These problems can be solved by developing machine learning algorithm which can handle very large sets of high-dimensional data. The high-dimensional data contains an unprecedented level of complexity, hence, some forms of complexity control are therefore necessary. Alternatively a suitable dimensional reduction method can be used. This thesis consists of four major works which are conducted with the MDL Drug Data Report (MDDR) database. The works are as follows: (i) Development of binary kernel discrimination (BKD). (ii) A new algorithm is introduced for kernel machine family, the so-call "parsimonious kernel fisher discrimination". The proposed algorithm is then applied to VS tasks. (iii) Prediction by posterior estimation in VS. (iv) A comparison of four variants of principal component analysis with potential in VS. The experiments show that, BKD in conjunction with Jaccard/Tanimoto is found to be the best method while other approaches are found to be less accurate than BKD but still comparable in a number of cases.
This record has no associated files available for download.
More information
Published date: 26 November 2007
Organisations:
Electronics & Computer Science
Identifiers
Local EPrints ID: 266583
URI: http://eprints.soton.ac.uk/id/eprint/266583
PURE UUID: 2dcb8864-f084-4039-b755-63a4b32617dc
Catalogue record
Date deposited: 20 Aug 2008 11:51
Last modified: 10 Dec 2021 22:19
Export record
Contributors
Author:
Kitsuchart Pasupa
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics