Thursday, November 05, 2009

The Visual Memex: Visual Object Recognition Without Categories


Figure 1
I have discussed the limitations of using rigid object categories in computer vision, and my CVPR 2008 work on Recognition as Association was a move towards developing a category-free model of objects. I was primarily concerned with local object recognition where the recognition problem was driven by the appearance/shape/texture features derived from within a segment (a region extraction from an image using an image segmentation algorithm). Recognition of objects was done locally and independently per region, since I did not have good model of category-free context at that time. I've given the problem of contextual object reasoning much thought over the past several years, and equipped with the power of graphical models and learning algorithms I now present a model for category-free object relationship reasoning.

Now its 2009, and its no surprise that I have a paper on context. Context is the new beast and all the cool kids are using it for scene understanding; however, categories are used so often for this problem that their use is rarely questioned. In my NIPS 2009 paper, I present a category-free model of object relationships and address the problem of context-only recognition where the goal is to recognize an object solely based on contextual cues. Figure 1 shows an example of such a prediction task. Given K objects and their spatial configuration, is it possible to predict the appearance of a hidden object at some spatial location?

Figure 2


I present a model called the Visual Memex (visualized in Figure 2), which is a non-parametric graph-based model of visual concepts and their interactions. Unlike traditional approaches to object-object modeling which learn potentials between every pair of categories (the number of such pairs scales quadratically with the number of categories), I make no category assumptions for context.

The official paper is out, and can be found on my project page:

Tomasz Malisiewicz, Alexei A. Efros. Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships. In NIPS, December 2009. PDF

Abstract: The use of context is critical for scene understanding in computer vision, where the recognition of an object is driven by both local appearance and the object's relationship to other elements of the scene (context). Most current approaches rely on modeling the relationships between object categories as a source of context. In this paper we seek to move beyond categories to provide a richer appearance-based model of context. We present an exemplar-based model of objects and their relationships, the Visual Memex, that encodes both local appearance and 2D spatial context between object instances. We evaluate our model on Torralba's proposed Context Challenge against a baseline category-based system. Our experiments suggest that moving beyond categories for context modeling appears to be quite beneficial, and may be the critical missing ingredient in scene understanding systems.

I gave at talk about my work yesterday at CMU's Misc-read and received some good feedback. I'll be at NIPS this December representing this body of research.