The University of Southampton
University of Southampton Institutional Repository

Designing a resource-allocating codebook for patch-based visual object recognition

Designing a resource-allocating codebook for patch-based visual object recognition
Designing a resource-allocating codebook for patch-based visual object recognition
The state-of-the-art approach in visual object recognition is the use of local information extracted at several points or image patches from an image. Local information at specific points can deal with object shape variability and partial occlusions. The underlying idea is that, in different images, the statistical distribution of the patches is different, which can be effectively exploited for recognition. In such a patch-based object recognition system, the key role of a visual codebook is to provide a way to map the low-level features into a fixed-length vector in histogram space to which standard classifiers can be directly applied. The discriminative power of a visual codebook determines the quality of the codebook model, whereas the size of the codebook controls the complexity of the model. Thus, the construction of a codebook plays a central role that affects the model’s complexity. The construction of a codebook is an important step which is usually done by cluster analysis. However, clustering is a process that retains regions of high density in a distribution and it follows that the resulting codebook need not have discriminant properties. This is also recognised as a computational bottleneck of such systems.

This thesis demonstrates a novel approach, that we call resource-allocating codebook (RAC), to constructing a discriminant codebook in a one-pass design procedure inspired by the resource-allocation network family of algorithms. The RAC approach slightly outperforms more traditional approaches due to its tendency to spread out the cluster centres over a broader range of the feature space thereby including rare low-level features in the codebook than density-preserving clustering-based codebooks. Our algorithm achieves this performance at drastically reduced computing times, because apart from an initial scan through a small subset to determine length scales, each data item is processed only once.

We illustrate some properties of our method and compare it to a closely related approach known as the mean-shift clustering technique. A pruning strategy has been employed to tackle a few outliers when assigning each feature in images to the closest codeword to create a histogram representation for each image. Features whose distance from the closest codeword exceeds an empirical distance maximum are neglected. A recognition system that learns incrementally with training images and the output classifier accounting for class-specific discriminant features is also presented. Furthermore, we address an approach which, instead of clustering, adaptively constructs a codebook by computing
Fisher scores between the classes of interest.

This thesis also demonstrates a novel sequential hierarchical clustering technique that initially builds a hierarchical tree from a small subset of the data, while the remaining data are processed sequentially and the tree adapted constructively. Evaluations performed with this approach show that the performance is comparable while reducing the computational needs. Finally, during the process of classification, we demonstrate a new learning architecture for multi-class classification tasks using support vector machines. This technique is faster in testing compared to directed acyclic graph (DAG) SVMs, while maintaining comparable performance to the standard multi-class classification techniques.
Ramanan, Amirthalingam
4022cd1c-7bc1-4f98-8222-7c3886077e3b
Ramanan, Amirthalingam
4022cd1c-7bc1-4f98-8222-7c3886077e3b
Niranjan, Mahesan
5cbaeea8-7288-4b55-a89c-c43d212ddd4f

Ramanan, Amirthalingam (2010) Designing a resource-allocating codebook for patch-based visual object recognition. University of Southampton, School of Electronics and Computer Science, Doctoral Thesis, 130pp.

Record type: Thesis (Doctoral)

Abstract

The state-of-the-art approach in visual object recognition is the use of local information extracted at several points or image patches from an image. Local information at specific points can deal with object shape variability and partial occlusions. The underlying idea is that, in different images, the statistical distribution of the patches is different, which can be effectively exploited for recognition. In such a patch-based object recognition system, the key role of a visual codebook is to provide a way to map the low-level features into a fixed-length vector in histogram space to which standard classifiers can be directly applied. The discriminative power of a visual codebook determines the quality of the codebook model, whereas the size of the codebook controls the complexity of the model. Thus, the construction of a codebook plays a central role that affects the model’s complexity. The construction of a codebook is an important step which is usually done by cluster analysis. However, clustering is a process that retains regions of high density in a distribution and it follows that the resulting codebook need not have discriminant properties. This is also recognised as a computational bottleneck of such systems.

This thesis demonstrates a novel approach, that we call resource-allocating codebook (RAC), to constructing a discriminant codebook in a one-pass design procedure inspired by the resource-allocation network family of algorithms. The RAC approach slightly outperforms more traditional approaches due to its tendency to spread out the cluster centres over a broader range of the feature space thereby including rare low-level features in the codebook than density-preserving clustering-based codebooks. Our algorithm achieves this performance at drastically reduced computing times, because apart from an initial scan through a small subset to determine length scales, each data item is processed only once.

We illustrate some properties of our method and compare it to a closely related approach known as the mean-shift clustering technique. A pruning strategy has been employed to tackle a few outliers when assigning each feature in images to the closest codeword to create a histogram representation for each image. Features whose distance from the closest codeword exceeds an empirical distance maximum are neglected. A recognition system that learns incrementally with training images and the output classifier accounting for class-specific discriminant features is also presented. Furthermore, we address an approach which, instead of clustering, adaptively constructs a codebook by computing
Fisher scores between the classes of interest.

This thesis also demonstrates a novel sequential hierarchical clustering technique that initially builds a hierarchical tree from a small subset of the data, while the remaining data are processed sequentially and the tree adapted constructively. Evaluations performed with this approach show that the performance is comparable while reducing the computational needs. Finally, during the process of classification, we demonstrate a new learning architecture for multi-class classification tasks using support vector machines. This technique is faster in testing compared to directed acyclic graph (DAG) SVMs, while maintaining comparable performance to the standard multi-class classification techniques.

Text
A.Ramanan_Ph.D._Thesis_2010.pdf - Other
Download (3MB)

More information

Published date: June 2010
Organisations: University of Southampton

Identifiers

Local EPrints ID: 159175
URI: http://eprints.soton.ac.uk/id/eprint/159175
PURE UUID: 50141d12-684d-4aff-a249-9effcc39f0a6
ORCID for Mahesan Niranjan: ORCID iD orcid.org/0000-0001-7021-140X

Catalogue record

Date deposited: 16 Jul 2010 11:47
Last modified: 14 Mar 2024 02:53

Export record

Contributors

Author: Amirthalingam Ramanan
Thesis advisor: Mahesan Niranjan ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×