Tombone's Computer Vision Blog: prototypes

Showing posts with label prototypes. Show all posts

Sunday, June 13, 2010

everything is misc -- torralba cvpr paper to check out

Weinberger's Everything is Miscellaneous is a delightful read -- I just finished it today while flying from PIT to SFO. It was recommended to me by my PhD advisor, Alyosha, and now I can see why! Many of the key motivations behind my current research on object representation deeply resonate in Weinberger's book.

Weinberger motivates Rosch's theory of categorization (the Prototype Model), and explains how it is a significant break from the thousand years of Aristotelian thought. Aristotle gave us the notion of a category -- centered around the notion of a definition. For Aristotle, every object can be stripped to its essential core, and place in its proper place in a God's-eye objective organization of the world. It was Rosch who showed us that categories are much fuzzier and more hectic than suggested by the rigid Aristotelian system. Just like Copernicus single-handedly stopped the Sun and set the Earth in motion, Rosch disintegrated our neatly organized world-view and demonstrated how an individual's path through life shapes h/er concepts.

I think it is fair to say that my own ideas as well as Weinberger's aren't so much an extension of the Roschian mode of thought, but also a significant break from the entire category-based way of thinking. Given that Rosch studied Wittgenstein as a student, I'm surprised her stance wasn't more extreme, more along the anti-category line of thought. I don't want to undermine her contribution to psychology and computer science in any way, and I want to be clear that she should only be lauded for her remarkable research. Perhaps Wittgenstein was as extreme and iconoclastic as I like my philosophers to be, but Rosch provided us with a computational theory and not just a philosophical lecture.

From my limited expertise in theories of categorization in the field of Psychology, whether it is Prototype Models or the more recent data-driven Exemplar Models, these theories are still theories of categories. Whether the similarity computations are between prototypes and stimuli, or between exemplars and stimuli, the output of a categorization model is still a category. Weinberger is all about modern data-driven notions of knowledge organization, in a way that breaks free from the imprisoning notion of a category. Knowledge is power, so why imprison it in rigid modules called categories? Below is a toy visualization of a web of concepts, as imagined by me. This is very much the web-based view of the world. Wikipedia is a bunch of pages and links.

Artistic rendition of a "web of concepts"

I found it valuable to think of the Visual Memex, the model I'm developing in my thesis research, as an anti-categorization model of knowledge -- a vast network of object-object relationships. The idea of using little concrete bits of information to create a rich non-parametric web is the recurring theme in Weinberger's book. In my case, the problem of extracting primitives from images, and all of the problem in dealing with real-world images are around to plague me, and the Visual Memex must rely on many Computer Vision techniques -- such things are not discussed in Weinberger's book. The "perception" or "segmentation" component of the Visual Memex is not trivial -- where linking words on the web is much easier.

CVPR paper to look out for

However, the category-based view is all around us. I expect most of this year's CVPR papers to fit in this category-based view of the world. One paper, co-authored by the great Torralba, looks relevant to my interests. It is yet another triumph for the category-based mentality in computer vision. In fact, one of the figures in the paper demonstrates the category-based view of the world very well. Unlike the memex, the organization is explicit in the following figure:

Exploiting Hierarchical Context on a Large Database of Object Categories

Exploiting Hierarchical Context on a Large Database of Object Categories

Myung Jin Choi, Joseph Lim, Antonio Torralba, and Alan S. Willsky. CVPR 2010.

Monday, October 19, 2009

Scene Prototype Models for Indoor Image Recognition

In today's post I want to briefly discuss a computer vision paper which has caught my attention.

In the paper Recognizing Indoor Scenes, Quattoni and Torralba build a scene recognition system for categorizing indoor images. Instead of performing learning directly in descriptor space (such as the GIST over the entire image), the authors use a "distance-space" representation. An image is described by a vector of distances to a large number of scene prototypes. A scene prototype consists of a root feature (the global GIST) as well as features belonging to a small number of regions associated with the prototype. One example of such a prototype might be an office scene with a monitor region in the center of the image and a keyboard region below it -- however the ROIs (which can be thought of as parts of the scene) are often more abstract and do not neatly correspond to a single object.

The learning problem (which is solved once per category) is then to find the internal parameters of each prototype as well as the per-class prototype distance weights which are used for classification. From a distance function learning point of view, it is rather interesting to see distances to many exemplars being used as opposed to the distance to a single focal exemplar.

Although the authors report results on the image categorization task it is worthwhile to ask if scene prototypes could be used for object localization. While it is easy to be the evil genius and devise an image that is unique enough such that it doesn't conform to any notion of a prototype, I wouldn't be surprised if 80% of the images we encounter on the internet conform to a few hundred scene prototypes. Of course the problem of learning such prototypes from data without prototype-labeling (which requires expert vision knowledge) is still open. Overall, I like the direction and ideas contained in this research paper and I'm looking forward to see how these ideas develop.

Friday, June 19, 2009

A Shift of Focus: Relying on Prototypes versus Support Vectors

The goal of today's blog post is to outline an important difference between traditional categorization models in Psychology such as Prototype Models, and Support Vector Machine (SVM) based models.

When solving a SVM optimization problem in the dual (given a kernel function), the answer is represented as a set of weights associated with each of the data-centered kernels. In the Figure above, a SVM is used to learn a decision boundary between the blue class (desks) and the red class (chairs). The sparsity of such solutions means that only a small set of examples are used to define the class decision boundary. All points on the wrong side of the decision boundary and barely yet correctly classified points (within the margin) have non-zero weights. Many Machine Learning researchers get excited about the sparsity of such solutions because in theory, we only need to remember a small number of kernels for test time. However, the decision boundary is defined with respect to the problematic examples (misclassified and barely classified ones) and not the most typical examples. The most typical (and easy to recognize) examples are not even necessary to define the SVM decision boundary. Two data sets that have the same problematic examples, but significant differences in the "well-classified" examples might result in the same exact SVM decision boundary.

My problem with such boundary-based approaches is that by focusing only on the boundary between classes useful information is lost. Consider what happens when two points are correctly classified (and fall well beyond the margin on their correct side): the distance-to-decision-boundary is not a good measure of class membership. By failing to capture the "density" of data, the sparsity of such models can actually be a bad thing. As with discriminative methods, reasoning about the support vectors is useful for close-call classification decisions, but we lose fine-scale membership details (aka "density information") far from the decision surface.

In a single-prototype model (pictured above), a single prototype is used per class and distances-to-prototypes implicitly define the decision surface. The focus is on exactly the 'most confident' examples, which are the prototypes. Prototypes are created during training -- if we fit a Gaussian distribution to each class, the mean becomes the prototype. Notice that by focusing on Prototypes, we gain density information near the prototype at the cost of losing fine-details near the decision boundary. Single-Prototype models generally perform worse on forced-choice classification tasks when compared to their SVM-based discriminative counterparts; however, there are important regimes where too much emphasis on the decision boundary is a bad thing.

In other words, Prototype Methods are best and what they were designed to do in categorization, namely capture Typicality Effects (see Rosch). It would be interesting to come up with more applications where handing Typicality Effects and grading membership becomes more important than making close-call classification decision. I suspect that in many real-world information retrieval applications (where high precision is required and low recall tolerated) going beyond boundary-based techniques is the right thing to do.

Friday, June 12, 2009

Exemplars, Prototypes, and towards a Theory of Concepts for AI

While initial musings (and some early theories) on Categorization come from Philosophy (think Categories by Aristotle), most modern research on Categorization which adheres to the scientific method comes from Psychology (Concept Learning on Wikipedia). Two popular models which originate from Psychology literature are Prototype Theory and Exemplar Theory. Summarizing briefly, categories in Prototype Theory are abstractions which summarize a category while categories in Exemplar Theory are represented nonparametrically. While I'm personally a big proponent of Exemplar Theory (see my Recognition by Association CVPR2008 paper), I'm not going to discuss the details of my philosophical stance in this post. I want to briefly point out the shortcomings of these two simplified views of concepts.

Researchers focusing on Categorization are generally dealing with a very simplified (and overly academic) view of the world -- where the task is to categorize a single input stimulus. The problem is that if we want a Theory of Concepts that will be the backbone of intelligent agents, we have to deal with relationships between concepts with as much fervor as the representations of concepts themselves. While the debate concerning exemplars vs. prototypes has been restricted to these single stimulus categorization experiments, it is not clear to me why we should prematurely adhere to one of these polarized views before we consider how we can make sense of inter-category relationships. In other words, if an exemplar-based view of concepts looks good (so-far) yet it is not as useful for modeling relationships as a prototype-view, then we have to change our views. Following James' pragmatic method, we should evaluate category representations with respect to a larger system embodied in an intelligent agent (and its ability to cope with the world) and not the overly academic single-stimulus experiments dominating experimental psychology.

On another note, I submitted my most recent research to NIPS last week (supersecret for now), and went to a few Phish concerts. I'm driving to California next week and I start at Google at the end of June. I also started reading a book on James and Wittgenstein.