Practical scalable image analysis and indexing using Hadoop
Practical scalable image analysis and indexing using Hadoop
The ability to handle very large amounts of image data is important for image analysis, indexing and retrieval applications. Sadly, in the literature, scalability aspects are often ignored or glanced over, especially with respect to the intricacies of actual implementation details.
In this paper we present a case-study showing how a standard bag-of-visual-words image indexing pipeline can be scaled across a distributed cluster of machines. In order to achieve scalability, we investigate the optimal combination of hybridisations of the MapReduce distributed computational framework which allows the components of the analysis and indexing pipeline to be effectively mapped and run on modern server hardware. We then demonstrate the scalability of the approach practically with a set of image analysis and indexing tools built on top of the Apache Hadoop MapReduce framework. The tools used for our experiments are freely available as open-source software, and the paper fully describes the nuances of their implementation.
1-34
Hare, Jonathon S.
65ba2cda-eaaf-4767-a325-cd845504e5a9
Samangooei, Sina
c380fb26-55d4-4b34-94e7-c92bbb26a40d
Lewis, Paul H.
7aa6c6d9-bc69-4e19-b2ac-a6e20558c020
6 November 2012
Hare, Jonathon S.
65ba2cda-eaaf-4767-a325-cd845504e5a9
Samangooei, Sina
c380fb26-55d4-4b34-94e7-c92bbb26a40d
Lewis, Paul H.
7aa6c6d9-bc69-4e19-b2ac-a6e20558c020
Hare, Jonathon S., Samangooei, Sina and Lewis, Paul H.
(2012)
Practical scalable image analysis and indexing using Hadoop.
Multimedia Tools and Applications, .
(doi:10.1007/s11042-012-1256-0).
Abstract
The ability to handle very large amounts of image data is important for image analysis, indexing and retrieval applications. Sadly, in the literature, scalability aspects are often ignored or glanced over, especially with respect to the intricacies of actual implementation details.
In this paper we present a case-study showing how a standard bag-of-visual-words image indexing pipeline can be scaled across a distributed cluster of machines. In order to achieve scalability, we investigate the optimal combination of hybridisations of the MapReduce distributed computational framework which allows the components of the analysis and indexing pipeline to be effectively mapped and run on modern server hardware. We then demonstrate the scalability of the approach practically with a set of image analysis and indexing tools built on top of the Apache Hadoop MapReduce framework. The tools used for our experiments are freely available as open-source software, and the paper fully describes the nuances of their implementation.
Text
paper.pdf
- Accepted Manuscript
More information
Published date: 6 November 2012
Organisations:
Web & Internet Science
Identifiers
Local EPrints ID: 344243
URI: http://eprints.soton.ac.uk/id/eprint/344243
ISSN: 1380-7501
PURE UUID: 8cf2c3b4-5d82-4543-bdec-8003707046cb
Catalogue record
Date deposited: 12 Nov 2012 12:10
Last modified: 15 Mar 2024 03:25
Export record
Altmetrics
Contributors
Author:
Jonathon S. Hare
Author:
Sina Samangooei
Author:
Paul H. Lewis
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics