OpenIMAJ – Intelligent Multimedia Analysis in Java

Authors: Jonathon Hare, Sina Samangooei and David Dupplaw, University of Southampton, UK

More information: http://www.openimaj.org

Introduction

Multimedia analysis is an exciting and fast-moving research area. Unfortunately, historically there has been a lack of software solutions in a common programming language for performing scalable integrated analysis of all modalities of media (images, videos, audio, text, web-pages, etc). For example, in the image analysis world, OpenCV and Matlab are commonly used by researchers, whilst many common Natural Language Processing tools are built using Java. The lack of coherency between these tools and languages means that it is often difficult to research and develop rational, comprehensible and repeatable software implementations of algorithms for performing multimodal multimedia analysis. These problems are also exacerbated by the lack of any principled software engineering (separation of concerns, minimised code repetition, maintainability, understandability, premature optimisation and over optimisation) often found in research code. OpenIMAJ is a set of libraries and tools for multimedia content analysis and content generation that aims to fill this gap and address the concerns. OpenIMAJ provides a coherent interface to a very broad range of techniques, and contains everything from state-of-the-art computer vision (e.g. SIFT descriptors, salient region detection, face detection and description, etc.) and advanced data clustering and hashing, through to software that performs analysis on the content, layout and structure of webpages. A full list of the all the modules and an overview of their functionalities for the latest OpenIMAJ release can be found here. OpenIMAJ is primarily written in Java and, as such is completely platform independent. The video-capture and hardware libraries contain some native code but Linux, OSX and Windows are supported out of the box (under both 32 and 64 bit JVMs; ARM processors are also supported under Linux). It is possible to write programs that use the libraries in any JVM language that supports Java interoperability, for example Groovy and Scala. OpenIMAJ can even be run on Android phones and tablets. As it’s written using Java, you can run any application built using OpenIMAJ on any of the supported platforms without even having to recompile the code.

Some simple programming examples

The following code snippets and illustrations aim to give you an idea of what programming with OpenIMAJ is like, whilst showing some of the powerful features.

Face detection, keypoint localisation and model fitting
... // A simple Haar-Cascade face detector HaarCascadeDetector det1 = new HaarCascadeDetector(); DetectedFace face1 = det1.detectFaces(img).get(0); new SimpleDetectedFaceRenderer().drawDetectedFace(mbf,10,face1); // Get the facial keypoints FKEFaceDetector det2 = new FKEFaceDetector(); KEDetectedFace face2 = det2.detectFaces(img).get(0); new KEDetectedFaceRenderer().drawDetectedFace(mbf,10,face2); // With the CLM Face Model CLMFaceDetector det3 = new CLMFaceDetector(); CLMDetectedFace face3 = det3.detectFaces(img).get(0); new CLMDetectedFaceRenderer().drawDetectedFace(mbf,10,face3); ...
... // Find the features DoGSIFTEngine eng = new DoGSIFTEngine(); List sourceFeats = eng.findFeatures(sourceImage); List targetFeats = eng.findFeatures(targetImage); // Prepare the matcher final HomographyModel model = new HomographyModel(5f); final RANSAC ransac = new RANSAC(model, 1500, new RANSAC.BestFitStoppingCondition(), true); ConsistentLocalFeatureMatcher2d matcher = new ConsistentLocalFeatureMatcher2d (new FastBasicKeypointMatcher(8)); // Match the features matcher.setFittingModel(ransac); matcher.setModelFeatures(sourceFeats); matcher.findMatches(targetFeats); ...
Finding and matching SIFT keypoints
... // Access First Webcam VideoCapture cap = new VideoCapture(640, 480); //grab a frame MBFImage last = cap.nextFrame().clone(); // Process Video VideoDisplay.createOffscreenVideoDisplay(cap) .addVideoListener( new VideoDisplayListener() { public void beforeUpdate(MBFImage frame) { frame.subtractInplace(last).abs(); last = frame.clone(); } ...
Webcam access and video processing

The OpenIMAJ design philosophy

One of the main goals in the design and implementation of OpenIMAJ was to keep all components as modular as possible, providing a clear separation of concerns whilst maximising code reusability, maintainability and understandability. At the same time, this makes the code easy to use and extend. For example, the OpenIMAJ difference-of-Gaussian SIFT implementation allows different parts of the algorithm to be replaced or modified at will without having to modify the source-code of the existing components; an example of this is our min-max SIFT implementation [1], which allows more efficient clustering of SIFT features by exploiting the symmetry of features detected at minima and maxima of the scale-space. Implementations of commonly used algorithms are also made as generic as possible; for example, the OpenIMAJ RANSAC implementation works with generic Modelobjects and doesn’t care whether the specific model implementation is attempting to fit a homography to a set of point-pair matches or a straight line to samples in a space. Primitive media types in OpenIMAJ are also kept as simple as possible: Images are just an encapsulation of a 2D-arrays of pixels; Videos are just encapsulated iterable collections/streams of images; Audio is just an encapsulated array of samples. The speed of individual algorithms in OpenIMAJ has not been a major development focus, however OpenIMAJ can not be called slow. For example, most of the algorithms implemented in both OpenIMAJ and OpenCV run at similar rates, and things such as SIFT detection and face detection can be run in real-time. Whilst the actual algorithm speed has not been a particular design focus, scalability of the algorithms to massive datasets has. Because OpenIMAJ is written in Java, it is trivial to integrate it with tools for distributed data processing, such as Apache Hadoop. Using the OpenIMAJ Hadoop tools [3] on our small Hadoop cluster, we have extracted and indexed visual term features from datasets with sizes in excess of 50 million images. The OpenIMAJ clustering implementations are able to cluster larger-than-memory datasets by reading data from disk as necessary.

A history of OpenIMAJ

OpenIMAJ was first made public in May 2011, just in time to be entered into the 2011 ACM Multimedia Open-Source Software Competition [2] which it went on to win. OpenIMAJ was not written overnight however. As shown in the following picture, parts of the original codebase came from projects as long ago as 2005. Initially, the features were focused around image analysis, with a concentration on image features used for CBIR (i.e. global histogram features), features for image matching (i.e. SIFT) and simple image classification (i.e. cityscape versus landscape classification).

A visual history of OpenIMAJ

As time went on, the list of features began to grow; firstly with more implementations of image analysis techniques (i.e. connected components, shape analysis, scalable bags-of-visual-words, face detection, etc). This was followed by support for analysing more types of media (video, audio, text, and web-pages), as well as implementations of more general techniques for machine learning and clustering. In addition, support for various hardware devices and video capture was added. Since its initial public release, the community of people and organisations using OpenIMAJ has continued to grow, and includes a number of internationally recognised companies. We also have an active community of people reporting (and helping to fix) any bugs or issues they find, and suggesting new features and improvements. Last summer, we had a single intern working with us, using and developing new features (in particular with respect to text analysis and mining functionality). This summer we’re expecting two or three interns who will help us leverage OpenIMAJ in the 2013 MediaEvalcampaign. From the point-of-view of the software itself, the number of features in OpenIMAJ continues to grow on an almost daily basis. Since the initial release, the core codebase has become much more mature and we’ve added new features and implementations of algorithms throughout. We’ve picked a couple of the highlights from the latest release version and the current development version below:

Reference Annotations

As academics we are quite used to the idea of throughly referencing the ideas and work of others when we write a paper. Unfortunately, this is not often carried forward to other forms of writing, such as the writing of the code for computer software. Within OpenIMAJ, we implement and expand upon much of our own published work, but also the published work of others. For the 1.1 release of OpenIMAJ we decided that we wanted to make it explicit where the idea for an implementation of each algorithm and technique came from. Rather than haphazardly adding references and citations in the Javadoc comments, we decided that the process of referencing should be more formal, and that the references should be machine readable. These machine-readable references are automatically inserted into the generated documentation, and can also be accessed programatically. It’s even possible to automatically generate a bibliography of all the techniques used by any program built on top of OpenIMAJ. For more information, take a look at thisblog post. The reference annotations are part of a bigger framework currently under development that aims to encourage better code development for experimentation purposes. The overall aim of this is to provide the basis for repeatable software implementations of experiments and evaluations, with automatic gathering of the basic statistics that all experiments should have, together with more specific statistics based on the type of evaluation (i.e. ROC statistics for classification experiments; TREC-style Precision-Recall for information retrieval experiments, etc).

Stream Processing Framework

Processing streaming data is a hot topic currently. We wanted to provide a way in OpenIMAJ to experiment with the analysis of streaming multimedia data (see the description of the “Twitter’s visual pulse” application below for example). The OpenIMAJ Streamclasses in the development trunk of OpenIMAJ provide a way to effectively gather, consume, process and analyse streams of data. For example, in just a few lines of code it is possible to get and display all the images from the live Twitter sample stream:

  //construct your twitter api key
  TwitterAPIToken token = ...

  // Create a twitter dataset instance connected to the live twitter sample stream
  StreamingDataset<Status> dataset = new TwitterStreamingDataset(token, 1);

  //use the Stream#map() method to transform the stream so we get images
  dataset
    //process tweet statuses to produce a stream of URLs
    .map(new TwitterLinkExtractor())
    //filter URLs to just get those that are URLs of images
    .map(new ImageURLExtractor())
    //consume the stream and display images
    .forEach(new Operation<URL>() {
      public void perform(URL url) {
        DisplayUtilities.display(ImageUtilities.readMBF(url));
      }
    });

The stream processing framework handles a lot of the hard-work for you. For example it can optionally drop incoming items if you are unable to consume the stream at a fast enough rate (in this case it will gather statistics about what it’s dropped). In addition to the Twitter live stream, we’ve provided a number of other stream source implementations, including the one based on the Twitter search API and one based on IRC chat. The latter was used to produce a simple visualisation of a world map that shows where current Wikipedia edits are currently happening.

Improved face pipeline

The initial OpenIMAJ release contained some support for face detection and analysis, however, this has been and continues to be improved. The key advantage OpenIMAJ has over other libraries such as OpenCV in this area is that it implements a complete pipeline with the following components:

Face Detection
Face Alignment
Facial Feature Extraction
Face Recognition/Classification

Each stage of the pipeline is configurable, and OpenIMAJ contains a number of different algorithm implementations for each stage as well as offering the possibility to easily implement more. The pipeline is designed to allow researchers to focus on a specific area of the pipeline without having to worry about the other components. At the same time, it is fairly easy to modify and evaluate a complete pipeline. In addition to the parts of the recognition pipeline, OpenIMAJ also includes code for tracking faces in videos and comparing the similarity of faces.

Improved audio processing & analysis functionality

When OpenIMAJ was first made public, there was little support for audio processing and analysis beyond playback, resampling and mixing. As OpenIMAJ has matured, the audio analysis components have grown, and now include standard audio feature extractors for things such as Mel-Frequency Cepstrum Coefficients (MFCCs), and higher level analysers for performing tasks such as beat detection, and determining if an audio sample is human speech. In addition, we’ve added a large number of generation, processing and filtering classes for audio signals, and also provided an interface between OpenIMAJ audio objects and the CMU Sphinx speech recognition engine.

Example applications

Every year our research group holds a 2-3 day Hackathon where we stop normal work and form groups to do a mini-project. For the last two years we’ve built applications using OpenIMAJ as the base. We’ve provided a short description together with some links so that you can get an idea of the varied kinds of application OpenIMAJ can be used to rapidly create.

Southampton Goggles

In 2011 we built “Southampton Goggles”. The ultimate aim was to build a geo-localisation/geo-information system based on content-based matching of images of buildings on the campus taken with a mobile device; the idea was that one could take a photo of a building as a query, and be returned relevant information about that building as a response (i.e. which faculty/school is located in it, whether there are vending machines/cafe’s in the building, the opening times of the building, etc). The project had two parts: the first part was data collection in order to collect and annotate the database of images which we would match against. The second part involved indexing the images, and making the client and server software for the search engine. In order to rapidly collect images of the campus, we built a hand-portable streetview like camera device with 6 webcams, a GPS and compass. The software for controlling this used OpenIMAJ to interface with all the hardware and record images, location and direction at regular time intervals. The camera rig and software are shown below:

The Southampton Goggles Capture Rig

The Southampton Goggles Capture Software, built using OpenIMAJ

For the second part of the project, we used the SIFT feature extraction, clustering and quantisation abilities of OpenIMAJ to build visual-term representations of each image, and used our ImageTerrier software [3,4] to build an inverted index which could be efficiently queried. For more information on the project, see this blog post.

Twitter’s visual pulse

Last year, we decided that for our mini-project we’d explore the wealth of visual information on Twitter. Specifically we wanted to look at which images were trending based not on counts of repeated URLs, but on the detection of near-duplicate images hosted at different URLs. In order to do this, we used what has now become the OpenIMAJ stream processing framework, described above, to:

ingest the Twitter sample stream,
process the tweet text to find links,
filter out links that weren’t images (based on a set of patterns for common image hosting sites),
download and resample the images,
extract sift features,
use locality sensitive hashing to sketch each SIFT feature and store in an ensemble of temporal hash-tables.

This process happens continuously in real-time. At regular intervals, the hash-tables are used to build a duplicates graph, which is then filtered and analysed to find the largest clusters of duplicate images, which are then visualised. OpenIMAJ was used for all the constituent parts of the software: stream processing, feature extraction and LSH. The graph construction and filtering uses the excellent JGraphT library that is integrated into the OpenIMAJ core-math module. For more information on the “Twitter’s visual pulse” application, see the paper [5] and this video.

Erica the Rhino

This year, we’re involved in a longer-running hackathon activity to build an interactive artwork for a mass public art exhibition called Go! Rhinos that will be held throughout Southampton city centre over the summer. The Go! Rhinos exhibition features a large number of rhino sculptures that will inhabit the streets and shopping centres of Southampton. Our school has sponsored a rhino sculpture called Erica which we’ve loaded with Raspberry Pi computers, sensors and physical actuators. Erica is still under construction, as shown in the picture below:

Erica, the OpenIMAJ-powered interactive rhino sculpture

OpenIMAJ is being used to provide visual analysis from the webcams that we’ve installed as eyes in the rhino sculpture (shown below). Specifically, we’re using a Java program built on top of the OpenIMAJ libraries to perform motion analysis, face detection and QR-code recognition. The rhino-eyeprogram runs directly on a Raspberry Pi mounted inside the sculpture.

Erica’s eye is a servo-mounted webcam, powered by software written using OpenIMAJ and running on a Raspberry Pi

For more information, check out Erica’s website and YouTube channel, where you can see a prototype of the OpenIMAJ-powered eye in action.

Conclusions

For software developers, the OpenIMAJ library facilitates the rapid creation of multimedia analysis, indexing, visualisation and content generation tools using state-of-the-art techniques in a coherent programming model. The OpenIMAJ architecture enables scientists and researchers to easily experiment with different techniques, and provides a platform for innovating new solutions to multimedia analysis problems. The OpenIMAJ design philosophy means that building new techniques and algorithms, combining different approaches, and extending and developing existing techniques, are all achievable. We welcome you to come and try OpenIMAJ for your multimedia analysis needs. To get started watch the introductory videos, try the tutorial, and look through some of the examples. If you have any questions, suggestions or comments, then don’t hesitate to get in contact.

Acknowledgements

Early work on the software that formed the nucleus of OpenIMAJ was funded by the European Unions 6th Framework Programme, the Engineering and Physical Sciences Research Council, the Arts and Humanities Research Council, Ordnance Survey and the BBC. Current development of the OpenIMAJ software is primarily funded by the European Union Seventh Framework Programme under the ARCOMEM and TrendMiner projects. The initial public releases were also funded by the European Union Seventh Framework Programme under the LivingKnowledge together with the LiveMemories project, funded by the Autonomous Province of Trento.

Papers

[1] Hare, Jonathon, Samangooei, Sina and Lewis, Paul (2011) Efficient clustering and quantisation of SIFT features: Exploiting characteristics of the SIFT descriptor and interest region detectors under image inversion. At The ACM International Conference on Multimedia Retrieval (ICMR 2011), Trento, Italy, 17 - 20 Apr 2011. ACM Press.
[2] Hare, Jonathon, Samangooei, Sina and Dupplaw, David (2011) OpenIMAJ and ImageTerrier: Java Libraries and Tools for Scalable Multimedia Analysis and Indexing of Images. At ACM Multimedia 2011, Scottsdale, Arizona, USA, 28 Nov - 01 Dec 2011. ACM, 691-694.
[3] Hare, Jonathon, Samangooei, Sina, Dupplaw, David and Lewis, Paul H. (2012) ImageTerrier: an extensible platform for scalable high-performance image retrieval. At ACM International Conference on Multimedia Retrieval (ICMR’12), Hong Kong, HK, 05 - 08 Jun 2012. 8pp.
[4] Hare, Jonathon, Samangooei, Sina and Lewis, Paul (2012) Practical scalable image analysis and indexing using Hadoop. Multimedia Tools and Applications, 1-34. (doi:10.1007/s11042-012-1256-0).
[5] Hare, Jonathon, Samangooei, Sina, Dupplaw, David and Lewis, Paul. (2013) Twitter’s visual pulse. In, the 3rd ACM conference on International conference on multimedia retrieval, Dallas, US, 2pp, 297-298. (doi:10.1145/2461466.2461514).