The University of Southampton
University of Southampton Institutional Repository

Dimensionality Reduction and Representation for Nearest Neighbour Learning

Dimensionality Reduction and Representation for Nearest Neighbour Learning
Dimensionality Reduction and Representation for Nearest Neighbour Learning
An increasing number of intelligent information agents employ Nearest Neighbour learning algorithms to provide personalised assistance to the user. This assistance may be in the form of recognising or locating documents that the user might find relevant or interesting. To achieve this, documents must be mapped into a representation that can be presented to the learning algorithm. Simple heuristic techniques are generally used to identify relevant terms from the documents. These terms are then used to construct large, sparse training vectors. The work presented here investigates an alternative representation based on sets of terms, called set-valued attributes, and proposes a new family of Nearest Neighbour learning algorithms that utilise this set-based representation. The importance of discarding irrelevant terms from the documents is then addressed, and this is generalised to examine the behaviour of the Nearest Neighbour learning algorithm with high dimensional data sets containing such values. A variety of selection techniques used by other machine learning and information retrieval systems are presented, and empirically evaluated within the context of a Nearest Neighbour framework. The thesis concludes with a discussion of ways in which attribute selection and dimensionality reduction techniques may be used to improve the selection of relevant attributes, and thus increase the reliability and predictive accuracy of the Nearest Neighbour learning algorithm.
Payne, T.R.
e0956864-a64d-4333-b63b-e0dc4e8008b3
Payne, T.R.
e0956864-a64d-4333-b63b-e0dc4e8008b3

Payne, T.R. (1999) Dimensionality Reduction and Representation for Nearest Neighbour Learning. University of Aberdeen, Department of Computing Science, Doctoral Thesis.

Record type: Thesis (Doctoral)

Abstract

An increasing number of intelligent information agents employ Nearest Neighbour learning algorithms to provide personalised assistance to the user. This assistance may be in the form of recognising or locating documents that the user might find relevant or interesting. To achieve this, documents must be mapped into a representation that can be presented to the learning algorithm. Simple heuristic techniques are generally used to identify relevant terms from the documents. These terms are then used to construct large, sparse training vectors. The work presented here investigates an alternative representation based on sets of terms, called set-valued attributes, and proposes a new family of Nearest Neighbour learning algorithms that utilise this set-based representation. The importance of discarding irrelevant terms from the documents is then addressed, and this is generalised to examine the behaviour of the Nearest Neighbour learning algorithm with high dimensional data sets containing such values. A variety of selection techniques used by other machine learning and information retrieval systems are presented, and empirically evaluated within the context of a Nearest Neighbour framework. The thesis concludes with a discussion of ways in which attribute selection and dimensionality reduction techniques may be used to improve the selection of relevant attributes, and thus increase the reliability and predictive accuracy of the Nearest Neighbour learning algorithm.

Text
phd_thesis.pdf - Other
Download (1MB)

More information

Published date: 1999
Organisations: Electronics & Computer Science

Identifiers

Local EPrints ID: 257788
URI: http://eprints.soton.ac.uk/id/eprint/257788
PURE UUID: 2a185440-beea-4846-93c1-f208e69b622b

Catalogue record

Date deposited: 24 Jun 2003
Last modified: 29 Jan 2020 15:30

Export record

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×