Thursday, February 12, 2009

Context is the 'glue' that binds objects in coherent scenes.

This is not my own quote. It is on of my favorites from Moshe Bar. It comes from his paper "Visual Objects in Context."

I have been recently giving context (in the domain of scene understanding) some heavy thought.
While Bar's paper is good, the one I wanted to focus on goes back to the 1980s. According to Bar, the following paper (which I wish everybody would at least skim) is "a seminal study that characterizes the rules that govern a scene's structure and their influence on perception."

Biederman, I., Mezzanotte, R. J. & Rabinowitz, J. C. Scene perception: detecting and judging objects undergoing relational violations. Cogn. Psychol. 14, 143–177 (1982).

Biederman outlines 5 object relations. The three semantic (related to object categories) relations are probability, position, and familiar size. The two syntactic (not operating at the object category level) relations are interposition and support. According to Biederman, "these relations might constitute a sufficient set with which to characterize the organizations of a real-world scene as distinct from a display of unrelated objects." The world has structure and characterizing this structure in terms of such rules is quite a noble effort.

A very interesting question that Biederman addresses is the following: do humans reason about syntactic relations before semantic relations or the other way around? A Gibsonian (direct perception) kind of way to think about the world is that processing of depth and space precedes the assignment of identity to the stuff that occupies the empty space around us. J.J. Gibson's view is in accordance with Marr's pipeline.

However, Biederman's study with human subjects (he is a psychologist) suggests that information about semantic relationships among objects is not accessed after semantic-free physical relationships. Quoting him directly, "Instead of a 3D parse being the initial step, the pattern recognition of the contours and the access to semantic relations appear to be the primary stages" as well as "further evidence that an object's semantic relations to other objects are processed simultaneously with its own identification."

Now that I've wet your appetite, let's bring out the glue.

P.S. Moshe Bar was a student of I. Biederman and S. Ullman (Against Direct Perception author).

Thursday, February 05, 2009

suns 2009 cool talks

This past Friday I went SUNS 2009 at MIT, and in my opinion the coolest talks were by Aude Oliva, David Forsyth, and Ce Liu.

While I will not summarize their talks which referred to unpublished work, I will provide a few high level questions that summarize (according to me) the ideas conveyed by these speakers.

Aude: What is the interplay between scene-level context, local category-specific features, as well as category-independent saliency that makes us explore images in a certain way when looking for objects?

David: Is naming all the objects depicted in an image the best way to understand an image? Don't we really want some type of understanding that will allow us to reason about never before seen objects?

Ce: Can we understand the space of all images by cleverly interpolating between what we are currently perceiving and what we have seen in the past?