Welcome to the era of visual intelligence -- the era of Visual Memex-based overlords (now in 3D!)
Image from Context-Based Search for 3D Models, by Matthew Fisher and Pat Hanrahan
The goal of today's post is simple: to empower you, the reader, with an exciting and fresh perspective on the problem of visual reasoning. This simple idea is one of the central tenets promulgated in my upcoming doctoral dissertation -- and but I'd like to give this potent meme a head start. Visual Memex-style reasoning is not the kind of reasoning that is described in classic graduate level textbooks on AI (e.g. first-order logic). In the case that you've mentally over-fit to a graduate-level CS curriculum, you might even portray my iconoclastic views as ramblings of a lunatic -- this is okay, I know at least Ludwig would be proud.
The Visual Memex is a mentality/perspective which, I believe, can overcome many limitations faced by modern computer vision systems. What the Visual Memex can do for visual intelligence is akin to what the World Wide Web has done for knowledge (see Weinberger's excellent book "Everything is Miscellaneous" for the full argument). It's akin to using Google for acquiring knowledge instead of going to the library -- maybe knowledge was never meant to be embedded in bookshelves. The idea is embarrassingly simple: replace visual object categories with object exemplars and relationships between those exemplars. Maybe the linguistic categories that we (as humans) cannot seem to live without are mere shadows cast on the wall of a dark cave. Psychologists have long abandoned rigid categories in their models of how humans think about concepts, but the notion of a class is so fundamental to contemporary Machine Learning that many haven't even bothered to question its tenuous foundations. While categories (also referred to as classes) definitely make learning algorithms easier to formalize, maybe its better to let the data speak for itself. Free the data!
One upcoming research paper inspired by this category-free mentality is: Context-Based Search for 3D Models, by Matthew Fisher and Pat Hanrahan, of Stanford University. This paper will be presented at SIGGRAPH Asia 2010. Maybe it is time to abandon those rigid categories and memexify your own research problem?
- The Classic: Vannevar Bush's Memex
- Plato's Allegory of the Cave
- 2D object recognition: Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships
- 3D object modeling: Context-Based Search for 3D Models