The University of Southampton
University of Southampton Institutional Repository

Comparison and performance enhancement of modern pattern classifiers

Comparison and performance enhancement of modern pattern classifiers
Comparison and performance enhancement of modern pattern classifiers
This thesis is a critical empirical study, using a range of benchmark datasets, on the performance of some modern machine learning systems and possible enhancements to them. When new algorithms and their performance are reported in the machine learning literature, most authors pay little attention to reporting the statistical significances in performance differences. We take Gaussian process classifiers as an example, which shows disappointing number of performance evaluations in the literature. What is particularly ignored is any use of the uncertainties in the performance measures when making comparisons. This thesis makes a novel contribution by developing a methodology for formal comparisons that also include performance uncertainties. Using support vector machine (SVM) as classification architectures, the thesis explores two potential enhancements to complexity reduction: (a) subset selection on the training data by some pre-processing approaches, and (b) organising the classes of a multi-class problem in a tree structure for fast classification. The former is crucial, as dataset sizes are known to have increased rapidly, and the straightforward training using quadratic programming over all of the given data is prohibitively expensive. While some researchers focus on training algorithms that operate in a stochastic manner, we explore data reduction by cluster analysis. Multi-class problems in which the number of classes is very large are of increasing interest. Our contribution is to speed up the training by removing as many irrelevant data as possible and preserving the potential data that are believed to be support vectors. The results show that too high a data reduction rate can degrade performance. However, on a subset of problems, the proposed methods have produced comparable results to the full SVM despite the high reduction rate. The new learning tree structure can then be combined with the data selection methods to obtain a further increase in speed. Finally, we also critically review SVM classification problems in which the input data is binary. In the chemoinformatics and bioinformatics literature, the Tanimoto kernel has been empirically shown to have good performance. The work we present, using carefully set up synthetic data of varying dimensions and dataset sizes, casts doubt on such claims. Improvements are noticeable, but not to the extent claimed in previous studies.
University of Southampton
Suppharangsan, Somjet
8a015408-b35c-429e-ba10-9d14e39b994d
Suppharangsan, Somjet
8a015408-b35c-429e-ba10-9d14e39b994d
Niranjan, M.
5cbaeea8-7288-4b55-a89c-c43d212ddd4f

Suppharangsan, Somjet (2010) Comparison and performance enhancement of modern pattern classifiers. University of Southampton, School of Electronics and Computer Science, Doctoral Thesis, 144pp.

Record type: Thesis (Doctoral)

Abstract

This thesis is a critical empirical study, using a range of benchmark datasets, on the performance of some modern machine learning systems and possible enhancements to them. When new algorithms and their performance are reported in the machine learning literature, most authors pay little attention to reporting the statistical significances in performance differences. We take Gaussian process classifiers as an example, which shows disappointing number of performance evaluations in the literature. What is particularly ignored is any use of the uncertainties in the performance measures when making comparisons. This thesis makes a novel contribution by developing a methodology for formal comparisons that also include performance uncertainties. Using support vector machine (SVM) as classification architectures, the thesis explores two potential enhancements to complexity reduction: (a) subset selection on the training data by some pre-processing approaches, and (b) organising the classes of a multi-class problem in a tree structure for fast classification. The former is crucial, as dataset sizes are known to have increased rapidly, and the straightforward training using quadratic programming over all of the given data is prohibitively expensive. While some researchers focus on training algorithms that operate in a stochastic manner, we explore data reduction by cluster analysis. Multi-class problems in which the number of classes is very large are of increasing interest. Our contribution is to speed up the training by removing as many irrelevant data as possible and preserving the potential data that are believed to be support vectors. The results show that too high a data reduction rate can degrade performance. However, on a subset of problems, the proposed methods have produced comparable results to the full SVM despite the high reduction rate. The new learning tree structure can then be combined with the data selection methods to obtain a further increase in speed. Finally, we also critically review SVM classification problems in which the input data is binary. In the chemoinformatics and bioinformatics literature, the Tanimoto kernel has been empirically shown to have good performance. The work we present, using carefully set up synthetic data of varying dimensions and dataset sizes, casts doubt on such claims. Improvements are noticeable, but not to the extent claimed in previous studies.

Text
S.Suppharangsan-PhDThesis2010.pdf - Other
Available under License University of Southampton Thesis Licence.
Download (2MB)

More information

Published date: November 2010
Organisations: University of Southampton

Identifiers

Local EPrints ID: 170393
URI: http://eprints.soton.ac.uk/id/eprint/170393
PURE UUID: 668e4a4c-79c1-4167-8f67-803a42de6d97
ORCID for M. Niranjan: ORCID iD orcid.org/0000-0001-7021-140X

Catalogue record

Date deposited: 18 Jan 2011 16:41
Last modified: 14 Mar 2024 02:53

Export record

Contributors

Author: Somjet Suppharangsan
Thesis advisor: M. Niranjan ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×