Comments on Tombone's Computer Vision Blog: Understanding the role of categories in object recognition

The analogy between typing in programming language...

2010-02-09T18:31:08.473-05:00

The analogy between typing in programming languages and the use of categories in knowledge representation is very interesting.

A strongly typed language such as C++ requires each variable to belong to some class (whether it is built-in such as int or a user-defined one) and this helps to catch bugs as well as create faster compiled programs. Weakly typed programs allow variables to represent integers one second, and arrays of strings the next second. The interpretation of variables in such a class-free system is interpretation dependent.

After reading a bit about duck typing, it does seem very relevant to my post about categories in vision. In fact, the statement about ducks says something deep about human understanding and is relevant to empiricism.

This attribute-based approach remind me of Python&...

2010-02-09T17:42:41.046-05:00

This attribute-based approach remind me of Python's duck typing approach of classes: "when I see a bird that walks like a duck and swims like a duck and quacks like a duck, I call that bird a duck."

It's interesting.

Working on knowledge processing (ontologies et caetera), I should as well add that the border between classes and attributes is relatively thin. Very often, classes (ie categories) are actually defined in term of attribute a certain object has. It goes both way: an instance that has the right attribute can be inferred to belong to some class ; an instance of a class inherits all the attributes of this class.

I'm right now drinking coffee with a nice red ...

2010-01-11T17:23:56.382-05:00

I'm right now drinking coffee with a nice red copy of this Categorization book next to me. I'm on page 56 reading Perona's essay. I have read most of the chapters, and I agree that Dickinson's treatise is excellent. I have commented on Edelman's and Bar's chapters before on my blog (even when those articles weren't part of this book yet).

On a related note, you may be interested in the fo...

2010-01-11T17:09:02.494-05:00

On a related note, you may be interested in the following recent edited volume:

Object Categorization: Computer and Human Vision Perspectives by Sven J. Dickinson, Ales Leonardis, Bernt Schiele, and Michael J. Tarr (Hardcover - Sep 7, 2009)

Dickinson especially has a nice historical survey of object categorization in computer vision.

The concept of catergorization could be applied in...

2009-12-24T18:55:57.630-05:00

The concept of catergorization could be applied in another field, namely document search and machine translation. What I have in mind is tagging words in a document. Examples would be tagging proper names to separate Baker the name from baker the occupation. Second parts of speech could be coded to avoid the "time flies" ambiguity. Third, present and past tense could be market for words like put which use the same word for both. Finally, the big one would be to identify multiple meanings of the same word.

I realize this would be an enormous tasks that be only partially automated. Might it be worth it?

Dear Anonymous, I want to comment on your idea of...

2009-12-17T18:28:36.257-05:00

Dear Anonymous,

I want to comment on your idea of computing attributes directly from 3d shape information *without* recognition. First of all, computing 3d shape information from a static image is no easy task. Now when we're talking about an agent, we can have access to 2.5D imagery (from stereo or laser rangefinding), but we'll still have to segment out the object from the background. If the object is stationary on a stationary background (think of a phone on top of a stack of yellow pages books which is on top of a desk), then the agent will have to use prior experience to segment out the object. Without using information about how phones/books/tables look like (and/or their 3d shape), how will the agent know how many objects there are? What will prevent the agent from thinking that it is seeing a book-o-phone object instead of two distinct objects.

In short, I doubt any sort of coping with the environment (whether it is attribute-based on category-based) is possible without sorting to the wealth of past experience that an agent must possess. The question of whether past experience is stored in a memory-like system, or abstract category-based models are retained, is still open to debate.

I do believe that in some highly controlled environments reasoning about attributes directly from perception (Gibson-style "direct perception") is possible, but not for real natural environments. What this means is that such simple and silly environments can be useful (think of a child playing with a single toy in a clutter-free playpen) during training, but the ultimate test will have to be done in a more malicious setting.

It might even be that recognition isn't requir...

2009-12-17T10:07:15.401-05:00

It might even be that recognition isn't required at all for attributes. If an agent can get some idea about the 3d shape of the object then attributes can be computed directly from that. I'm thinking about "can grab", "can throw", "can seat on".

I mean, there's no need to relate the object to things, or examples, in the agent's memory for such things.

Andrew -- thanks for the pointers. I've seen ...

2009-12-15T16:50:31.136-05:00

Andrew -- thanks for the pointers. I've seen these papers before but haven't gotten around to reading them in detail.

I definitely agree that attributes haven't gotten the attention they deserve in the field of computer vision. Whether they are "more fundamental" than object categories I'm not sure yet. I definitely think it is a worthwhile discussion. Unfortunately the method under which the computer vision community evaluates progress in object recognition is biased towards categories. I surmise that when the community seriously considers embodied vision systems and leaves behind non-interactionable vision, we will see a significantly stronger interest in property/attribute-based recognition.

Santosh -- Thanks for the link. The ideas advocat...

2009-12-15T16:42:56.984-05:00

Santosh -- Thanks for the link. The ideas advocated in this workshop are definitely up my alley. This is the type of venue that would most likely agree with the discussions/questions from my blog.

Phong -- I agree that object attributes cannot be learned from one image. The problem of learning object categories or object attributes from a single image is ill-posed and a very difficult problem. I agree that once categorization has been performed it is easy to pull out the right attributes from a large database which maintains the attributes or a given category. What I wanted to argue in this blog post is that if a robotic agent is to make sense AND interact with the world it will have to reason about objects in terms of attributes. Thus if the attributes are somehow more important than object categories, perhaps we should ask ourselves if we can predict attributes without estimating categories first.

Tomasz-- another recent attributes paper from UIUC...

2009-12-15T15:16:05.183-05:00

Tomasz-- another recent attributes paper from UIUC:

Joint learning of visual attributes, object classes and visual saliency, ICCV 2009.

and a specific one for faces:
Attribute and simile
classifiers for face verification. ICCV 2009.

It is true to say that it isn't always clear what is an attribute and what is an object (i.e. "has tail") but that shouldn't lead us to conclude that attributes are not a useful (or even the most fundamental) approach for vision.

It seems fancy from your idea, but I think object ...

2009-12-08T08:34:42.224-05:00

It seems fancy from your idea, but I think object categorization is still the inevitable component in an intelligent system. Object's attributes cannot be learned from just one image, so we have to prepare a database of knowledge about them. Once object is recognized correctly, an inference engine will pull out the right attributes from database to "annotate"to that object. What happen if a zebra is recognized as a tiger? Basically, object categorization is indeed the right goal to pursuit.

Just in case you did not look at this: http://www....

2009-12-07T06:30:41.865-05:00

Just in case you did not look at this: http://www.idiap.ch/~bcaputo/icvw09.html

Thanks for the Whitman Richards reference. The 20...

2009-12-04T13:18:59.823-05:00

Thanks for the Whitman Richards reference. The 20 question game is definitely relevant.

This idea of propery/attribute based recognition a...

2009-12-04T00:17:41.186-05:00

This idea of propery/attribute based recognition also appears in some early papers of Whitman Richards, e.g. How To Play Twenty Questions With Nature and Win

I would argue that attribute-based recognition is ...

2009-12-02T10:03:22.283-05:00

I would argue that attribute-based recognition is quite different from category-based recognition but as I discussed them attributes seem like they have categories of their own. The chief difference is that in a category-based view, the world of objects is cut up into rigid categories and in an attribute-based view each object is more flexibly represented as a point in an attribute vector space.

In the paper "Describing objects by their attributes" some of the attributes chose are indeed very similar to object categories. It's not clear to me what the attributes should be, if they should be binary, discrete, or continuous. However, I think the attributes should capture non-visual properties of objects such as "is edible" and "is graspable with one hand" because only by understanding objects in terms of these concepts will a robotic agent have true understanding of the world around them. In summary, attributes should be tied to the sense modalities and motor/action capabilities of the agent using vision as a means to an end and not be purely visual "is furry" or part based "has tail" and "has leg".

Attribute-based recognition seems like a fantastic...

2009-12-02T05:29:22.944-05:00

Attribute-based recognition seems like a fantastic idea. But it seems that attributes are also categories. Tail is a category right? The same for leg.