Designing a resource-allocating codebook for patch-based visual object recognition
Ramanan, Amirthalingam (2010) Designing a resource-allocating codebook for patch-based visual object recognition. University of Southampton, School of Electronics and Computer Science, Doctoral Thesis , 130pp.
The state-of-the-art approach in visual object recognition is the use of local information extracted at several points or image patches from an image. Local information at specific points can deal with object shape variability and partial occlusions. The underlying idea is that, in different images, the statistical distribution of the patches is different, which can be effectively exploited for recognition. In such a patch-based object recognition system, the key role of a visual codebook is to provide a way to map the low-level features into a fixed-length vector in histogram space to which standard classifiers can be directly applied. The discriminative power of a visual codebook determines the quality of the codebook model, whereas the size of the codebook controls the complexity of the model. Thus, the construction of a codebook plays a central role that affects the model’s complexity. The construction of a codebook is an important step which is usually done by cluster analysis. However, clustering is a process that retains regions of high density in a distribution and it follows that the resulting codebook need not have discriminant properties. This is also recognised as a computational bottleneck of such systems.
This thesis demonstrates a novel approach, that we call resource-allocating codebook (RAC), to constructing a discriminant codebook in a one-pass design procedure inspired by the resource-allocation network family of algorithms. The RAC approach slightly outperforms more traditional approaches due to its tendency to spread out the cluster centres over a broader range of the feature space thereby including rare low-level features in the codebook than density-preserving clustering-based codebooks. Our algorithm achieves this performance at drastically reduced computing times, because apart from an initial scan through a small subset to determine length scales, each data item is processed only once.
We illustrate some properties of our method and compare it to a closely related approach known as the mean-shift clustering technique. A pruning strategy has been employed to tackle a few outliers when assigning each feature in images to the closest codeword to create a histogram representation for each image. Features whose distance from the closest codeword exceeds an empirical distance maximum are neglected. A recognition system that learns incrementally with training images and the output classifier accounting for class-specific discriminant features is also presented. Furthermore, we address an approach which, instead of clustering, adaptively constructs a codebook by computing
Fisher scores between the classes of interest.
This thesis also demonstrates a novel sequential hierarchical clustering technique that initially builds a hierarchical tree from a small subset of the data, while the remaining data are processed sequentially and the tree adapted constructively. Evaluations performed with this approach show that the performance is comparable while reducing the computational needs. Finally, during the process of classification, we demonstrate a new learning architecture for multi-class classification tasks using support vector machines. This technique is faster in testing compared to directed acyclic graph (DAG) SVMs, while maintaining comparable performance to the standard multi-class classification techniques.
|Item Type:||Thesis (Doctoral)|
|Subjects:||Q Science > QA Mathematics > QA75 Electronic computers. Computer science|
|Divisions:||University Structure - Pre August 2011 > School of Electronics and Computer Science > Information - Signals, Images, Systems
|Date Deposited:||16 Jul 2010 11:47|
|Last Modified:||08 Jun 2012 12:53|
|Contributors:||Ramanan, Amirthalingam (Author)
Niranjan, Mahesan (Thesis advisor)
|RDF:||RDF+N-Triples, RDF+N3, RDF+XML, Browse.|
Actions (login required)