Tombone's Computer Vision Blog: concepts

Showing posts with label concepts. Show all posts

Thursday, March 08, 2012

"I shot the cat with my proton gun."

I often listen to lectures and audiobooks when I drive more than 2 hours because I don't always have the privilege of enjoying a good conversation with a passenger. Recently I was listening to some philosophy of science podcasts on my iPhone while driving from Boston to New York when the following sentence popped into my head:

"I shot the cat with my proton gun."

I had just listened to three separate Podcasts (one about Kant, one about Wittgenstein and one about Popper) when the sentence came to my mind. What is so interesting about this sentence is that while it is effortless to grasp, it uses two different types of concepts in a single sentence, a "proton gun" and a "cat." It is a perfectly normal sentence, and the above illustration describes the sentence fairly well (photo credits to http://afashionloaf.blogspot.com/2010/03/cat-nap-mares.html for the kitty, and http://www.colemanzone.com/ for the proton gun).

Cat == an "everyday" empirical concept

"Cat" is an everyday "empirical" concept, a concept with which most people have first hand experience (i.e., empirical knowledge). It is commonly believed that such everyday concepts are acquired by children at a young age -- it is an exemple of a basic level concept which people like Immanuel Kant and Ludwig Wittgenstein discuss at great length. We do not need a theory of cats for the idea of a cat to stick.

Image from shadowpaw99

Proton Gun == a "scientific" theoretical concept

On the other extreme is the "proton gun." It is an example of a theoretical concept -- a type of concept which rests upon classroom (i.e., "scientific") knowledge. The idea of a proton gun is akin to the idea of Pluto, an esophagus or cancer -- we do not directly observe such entities, we learn about them from books and by seeing illustrations such as the one below. Such theoretical constructs are the the entities which Karl Popper and the Logical Positivists would often discuss.

While many of us have never seen a proton (nor a proton gun), it is a perfectly valid concept to invoke in my sentence. If you have a scientific background, then you have probably seen so many artistic renditions of protons (see Figure below) and spent so many endless nights studying for chemistry and physics exams, that the word proton conjures a mental image. It is hard for me to thing of entities which trigger mental imagery as non-empirical.

How do we learn such concepts? The proton gun comes from scientific education! The cat comes from experience! But since the origins of the concept "proton" and the concept "cat" are so disjoint, our (human) mind/brain must be more-amazing-than-previously-thought because we have no problem mixing such concepts in a single clause. It does not feel like these two different types of concepts are stored in different parts of the brain.

The idea which I would like you, the reader, to entertain over the next minute or so is the following:

Perhaps the line between ordinary "empirical" concepts and complex "theoretical" concepts is an imaginary boundary -- a boundary which has done more harm than good.

One useful thing I learned from Philosophy of Science, is that it is worthwhile to doubt the existence of theoretical entities. Not for iconoclastic ideals, but for the advancement of science! Descartes' hyperbolic doubt is not dead. Another useful thing to keep in mind is Wittgenstein's Philosophical Investigations and his account of the acquisition of knowledge. Wittgenstein argued elegantly that "everyday" concepts are far from "easy-to-define." (see his family resemblances argument and the argument on defining a "game.") Kant, with his transcendental aesthetic, has taught me to question a hardcore empiricist account of knowledge.

So then, as good cognitive scientists, researchers, and pioneers in artificial intelligence, we must also doubt the rigidity of those everyday concepts which appear to us so ordinary. If we want to build intelligent machines, then we must be ready to break down own understanding of reality, and not be afraid to questions things which appear unquestionable.

In conclusion, if you find popular culture reference more palatable than my philosophical pseudo-science mumbo-jumbo, then let me leave you with two inspirational quotes. First, let's not forget Pink Floyd's lyrics which argued against the rigidity of formal education: "We don't need no education, We don't need no thought control." And the second, a misunderstood, yet witty aphorism which comes to us from Dr. Timothy Leary reminds us that there is a time for education and there is a time for reflection. In his own words: "Turn on, tune in, drop out."

Sunday, June 13, 2010

everything is misc -- torralba cvpr paper to check out

Weinberger's Everything is Miscellaneous is a delightful read -- I just finished it today while flying from PIT to SFO. It was recommended to me by my PhD advisor, Alyosha, and now I can see why! Many of the key motivations behind my current research on object representation deeply resonate in Weinberger's book.

Weinberger motivates Rosch's theory of categorization (the Prototype Model), and explains how it is a significant break from the thousand years of Aristotelian thought. Aristotle gave us the notion of a category -- centered around the notion of a definition. For Aristotle, every object can be stripped to its essential core, and place in its proper place in a God's-eye objective organization of the world. It was Rosch who showed us that categories are much fuzzier and more hectic than suggested by the rigid Aristotelian system. Just like Copernicus single-handedly stopped the Sun and set the Earth in motion, Rosch disintegrated our neatly organized world-view and demonstrated how an individual's path through life shapes h/er concepts.

I think it is fair to say that my own ideas as well as Weinberger's aren't so much an extension of the Roschian mode of thought, but also a significant break from the entire category-based way of thinking. Given that Rosch studied Wittgenstein as a student, I'm surprised her stance wasn't more extreme, more along the anti-category line of thought. I don't want to undermine her contribution to psychology and computer science in any way, and I want to be clear that she should only be lauded for her remarkable research. Perhaps Wittgenstein was as extreme and iconoclastic as I like my philosophers to be, but Rosch provided us with a computational theory and not just a philosophical lecture.

From my limited expertise in theories of categorization in the field of Psychology, whether it is Prototype Models or the more recent data-driven Exemplar Models, these theories are still theories of categories. Whether the similarity computations are between prototypes and stimuli, or between exemplars and stimuli, the output of a categorization model is still a category. Weinberger is all about modern data-driven notions of knowledge organization, in a way that breaks free from the imprisoning notion of a category. Knowledge is power, so why imprison it in rigid modules called categories? Below is a toy visualization of a web of concepts, as imagined by me. This is very much the web-based view of the world. Wikipedia is a bunch of pages and links.

Artistic rendition of a "web of concepts"

I found it valuable to think of the Visual Memex, the model I'm developing in my thesis research, as an anti-categorization model of knowledge -- a vast network of object-object relationships. The idea of using little concrete bits of information to create a rich non-parametric web is the recurring theme in Weinberger's book. In my case, the problem of extracting primitives from images, and all of the problem in dealing with real-world images are around to plague me, and the Visual Memex must rely on many Computer Vision techniques -- such things are not discussed in Weinberger's book. The "perception" or "segmentation" component of the Visual Memex is not trivial -- where linking words on the web is much easier.

CVPR paper to look out for

However, the category-based view is all around us. I expect most of this year's CVPR papers to fit in this category-based view of the world. One paper, co-authored by the great Torralba, looks relevant to my interests. It is yet another triumph for the category-based mentality in computer vision. In fact, one of the figures in the paper demonstrates the category-based view of the world very well. Unlike the memex, the organization is explicit in the following figure:

Exploiting Hierarchical Context on a Large Database of Object Categories

Exploiting Hierarchical Context on a Large Database of Object Categories

Myung Jin Choi, Joseph Lim, Antonio Torralba, and Alan S. Willsky. CVPR 2010.

Tuesday, January 26, 2010

Beyond Categories =? Doing without Concepts

The term "beyond category," from my limited knowledge, was originally coined to describe the music of Duke Ellington. It is a term of praise that acknowledges that one's style is inimitable and transcends barriers.

"Beyond Categories" was the first part of my NIPS 2009 paper's title. To "go beyond" means to transcend, to abandon or do without some limitation and strive higher -- there is nothing magical about my use of the term. I used the term category to refer to object categories, as are commonly used in computer vision, artificial intelligence, machine learning, as well as psychology, philosophy, and other branches of cognitive science. One of my research goals is to go beyond the use of categories as the basis for machine perception and visual reasoning. It has been argued by Machery that the term category is roughly equivalent to the term concept as used in psychology literature. In some sense the title of Machery's recent book, "Doing without concepts," is analogous to the phrase "Beyond categories" but to reassure myself I'll have to finish reading Machery's book.

So far the first chapter has been a delightful exposition into the world of concepts, a term dear to researchers in machine perception (AI) as well as human categorization (psychology). I look forward to reading the rest of the book, which I accidentally found while looking for Estes' book on categorization. I had already digested/assimilated some of Machery's work, in particular his paper titled Concepts are not a natural kind, so seeing his name on a book at the CMU library piqued my interest. In this 2005 paper, Machery argues that the debate between prototypes vs. exemplars vs. theories in the literature on concepts is not well-founded and there is no reason to believe a single theory should prevail. I'll attempt to summarize some of his take-home messages and their relevance to computer vision once I finish this book.

Tuesday, November 24, 2009

Understanding the role of categories in object recognition

If we set aside our academic endeavors of publishing computer vision and machine learning papers and sincerely ask ourselves, "What is the purpose of recognition?" a very different story emerges.

Let me first outline the contemporary stance on recognition (that is object recognition as is embraced by the computer vision community), which is actually a bit of a "non-stance" because many people working on recognition haven't bothered to understand the motivations, implications, and philosophical foundations of their work. The standard view of recognition is that it is equivalent to categorization -- assigning an object its "correct" category is the goal of recognition. Object recognition, as is found in vision papers, is commonly presented as single image recognition task which is not tied to an active and mobile agent that must understand and act in an environment around them. These contrived tasks are partially to blame for making us think that categories are the ultimate truth. Of course, once we've pinpointed the correct category we can look up information about the object category at hand in some sort of grand encyclopedia. For example, once we've categorized an object as a bird we can simply recall the fact that "it flies" from such a source of knowledge.

Most object recognition research is concerned with object representations (what features to compute from an image) as well as supervised (and semi-supervised) machine learning techniques to learn object models from data in order to discriminate and thus "recognize" object categories. The reason why object recognition has become so popular in the recent decade is that many researchers in AI/Robotics envision a successful vision system as a key component in any real-world robotic platform. If you ask a human to describe their environment, we will probably use a bunch of nouns to enumerate the stuff around them, so surely nouns must be the basic building blocks of reality! In this post I want to question this commonsense assumption that categories are the building blocks of reality and propose a different way of coping with reality, one that doesn't try to directly estimate a category from visual data.

I argue that just because nouns (and the categories they refer to) are the basis of effability for humans, it doesn't mean that nouns and categories are the quarks and gluons of recognition. Language is a relatively recent phenomenon for humans (think evolutionary scale here), and it is absent in many animals inhabiting the earth beside us. It is absurd to think that animals do not possess a faculty for recognition just because they do not have a language. Since animals can quite effectively cope with the world around them, there must be hope for understanding recognition in a way that doesn't invoke linguistic concepts.

Let me make my first disclaimer. I am not against categories altogether -- they have their place. The goal of language is human-human communication and intelligent robotic agents will inevitably have to map their internal modes of representation onto human language if we are to understand and deal with such artificial beings. I just want to criticize the idea that categories are found deep within our (human) neural architecture and serve as the basis for recognition.

Imagine a caveman and his daily life which requires quite a bit of "recognition"-abilities to cope with the world around him. He must differentiate pernicious animals from edible ones, distinguish contentious cavefolk from his peaceful companions, and reason about the plethora of plants around him. For each object that he recognizes, he must be able to determine whether it is edible, dangerous, poisonous, tasty, heavy, warm, etc. In short, recognition amounts to predicting a set of attributes associated with an object. Recognition is the linking of perceptible attributes (it is green and the size of my fist) to our past experiences and predicting attributes that are not conveyed by mere appearance. If we see a tiger, it is solely on our past experiences that we can call it dangerous.

So imagine a vector space, where each dimension encodes an attribute such as edible, throwable, tasty, poisonous, kind, etc. Each object can be represented as a point in this attribute space. It is language that gives us categories as a shorthand to talk about commonly found objects. Different cultures would give rise to different ways of cutting up the world, and this is consistent with what has been observed by psychologists. Viewing categories as a way of compressing attribute vectors not only makes sense but is in agreement with the idea that categories culturally arose much later than the ability for humans to recognize objects. Thus it makes sense to think of category-free recognition. Since a robotic agent who was programmed to think of the world in terms of categories will have to unroll categories to understand objects in terms of tangible properties if they are to make sense of the world around them, why not use the properties/attributes as the primary elements of recognition in the first place!?

from Describing objects by their attributes

These ideas are not entirely new. In Computer Vision, there is a CVPR 2009 paper Describing objects by their attributes by Farhadi, Endres, Hoiem, and Forsyth (from UIUC) which strives to understand objects directly using the ideas discussed above. In the domain of thought recognition, the paper Zero-Shot Learning with Semantic Output Codes by Palatucci, Pomerleau, Hinton, and Mitchell strives to understand concepts in a similar semantic basis.

I believe the field of computer vision has been conceptually stuck and the vehement reliance on rigid object categories is partially to blame. We should read more Wittgenstein and focus more on understanding vision as a mere component of artificial intelligence. If we play the recognize objects in a static image game (as Computer Vision is doing!) then we obtain a fragmented view of reality and cannot fully understand the relationship between recognition and intelligence.

Monday, October 26, 2009

Wittgenstein's Critique of Abstract Concepts

In his Philosophical Investigations, Wittgenstein argues against abstraction -- via several thought experiments he strives to annihilate the view that during their lives humans develop neat and consistent concepts in their minds (akin to building a dictionary). He criticizes the commonplace notions of meaning and concept formation (as were commonly used in philosophical circles at the time) and has contributed greatly to my own ideas regarding categorization in computer vision.

Wittgenstein asks the reader to come up with the definition of the concept "game." While we can look up the definition of "game" in a dictionary, we can't help but feel that any definition will be either too narrow or too broad. The number of exceptions we would need in a single definition scales as the number of unique games we've been exposed to. His point wasn't that game cannot be defined -- it was that the lack of a formal definition does not prevent us from using the word "game" correctly. Think of a child growing up and being exposed to multi-player games, single-player games, fun games, competitive games, games that are primarily characterized by their display of athleticism (aka sports or Olympic Games). Let's not forget activities such as courting and the Stock Market which are also referred to as "games." Wittgenstein criticizes the idea that during our lives we somehow determine what is common between all of those examples of games and form an abstract concept of game which determines how we categorize novel activities. For Wittgenstein, our concept of game is not much more than our exposure to activities labeled as games and our ability to re-apply the word game in future context.

Wittgenstein's ideas are an antithesis to Platonic Realism and Aristotle's Classical notion of Categories, where concepts/categories are pure, well-defined, and possess neatly defined boundaries. For Wittgenstein, experience is the anchor which allows us to measure the similarity between a novel activity and past activities referred to as games. Maybe the ineffability of experience isn't because internal concepts are inaccessible to introspection, maybe there is simply no internal library of concepts in the first place.

An experience-based view of concepts (or as my advisor would say, a data-driven theory of concepts) suggests that there is no surrogate for living a life rich with experience. While this has implications for how one should live their own life, it also has implications in the field of artificial intelligence. The modern enterprise of "internet vision" where images are labeled with categories and fed into a classifier has to be questioned. While I have criticized categories, there are also problems with a purely data-driven large-database-based approach. It seems that a good place to start is by pruning away redundant bits of information; however, judging what is redundant and how is still an open question.

Tuesday, June 16, 2009

On Edelman's "On what it means to see"

I previously mentioned Shimon Edelman in my blog and why his ideas are important for the advancement of computer vision. Today I want to post a review of a powerful and potentially influential 2009 piece written by Edelman.

Below is a review of the June 16th, 2009 version of this paper:
Shimon Edelman, On what it means to see, and what we can do about it, in Object Categorization: Computer and Human Vision Perspectives, S. Dickinson, A. Leonardis, B. Schiele, and M. J. Tarr, eds. (Cambridge University Press, 2009, in press). Penultimate draft.

I will refer to the article as OWMS (On What it Means to See).

The goal of Edelman's article is to demonstrate the limitations of conceptual vision (referred to as "seeing as"), criticize the modern computer vision paradigm as being overly conceptual, and show how providing a richer representation of a scene is required for advancing computer vision.

Edelman proposes non-conceptual vision, where categorization isn't forced on an input -- "because the input may best be left altogether uninterpreted in the traditional sense." (OWMS) I have to agree with the author, where abstracting away the image into a conceptual map is not only an impoverished view of the world, but it is not clear whether such a limited representation is useful for other tasks relying on vision (something like the bottom of Figure 1.2 in OWMS or the Figure seen below and taken from my Recognition by Association talk).

Building a Conceptual Map = Abstracting Away

Drawing on insights from the influential Philosopher Wittgenstein, Edelman discusses the difference between "seeing" versus "seeing as." "Seeing as" is the easy-to-formalize map-pixels-to-objects attitude which modern computer vision students are spoon fed from the first day of graduate school -- and precisely the attitude which Edelman attacks in this wonderful article. To explain "seeing" Edelman uses some nice prose from Wittgenstein's Philosophical Investigations; however, instead of repeating the passages Edelman selected, I will complement the discussion with a relevant passage by William James:

The germinal question concerning things brought for the first time before consciousness is not the theoretic "What is that?" but the practical "Who goes there?" or rather, as Horwicz has admirably put it, "What is to be done?" ... In all our discussions about the intelligence of lower animals the only test we use is that of their acting as if for a purpose. (William James in Principles of Psychology, page 941)

"Seeing as" is a non-invertible process that abstracts away visual information to produce a lower dimensional conceptual map (see Figure above), whereas "seeing" provides a richer representation of the input scene. Its not exactly clear what is the best way to operationalize this "seeing" notion in a computer vision system, but the escapability-from-formalization might be one of the subtle points Edelman is trying to make about non-conceptual vision. Quoting Edelman, when "seeing" we are "letting the seething mass of categorization processes that in any purposive visual system vie for the privilege of interpreting the input be the representation of the scene, without allowing any one of them to gain the upper hand." (OWMS) Edelman goes on to criticize "seeing as" because vision systems have to be open-ended in the sense that we cannot specify ahead of time all the tasks that vision will be applied to. According to Edelman, conceptual vision cannot capture the ineffability (or richness) of the human visual experience. Linguistic concepts capture a mere subset of visual experience, and casting the goal of vision as providing a linguistic (or conceptual) interpretation is limited. The sparsity of conceptual understanding is one key limitation of the modern computer vision paradigm. Edelman also criticizes the notion of a "ground-truth" segmentation in computer vision, arguing that a fragmentation of the scene into useful chunks is in the eye of the beholder.

To summarize, Edelman points out that "The missing component is the capacity for having rich visual experiences... The visual world is always more complex than can be expressed in terms of a ﬁxed set of concepts, most of which, moreover, only ever exist in the imagination of the beholder." (OWMS) Being a pragmatist, many of these words resonate deeply within my soul, and I'm particularly attracted to elements of Edelman's antirealism.

I have to give two thumbs up to this article for pointing out the flaws in the current way computer vision scientists go about tackling vision problems (in other words researchers too often blindly work inside the current computer vision paradigm and do not often enough question fundamental assumptions which can help new paradigms arise). Many similar concerns regarding Computer Vision I have already pointed out on this blog, and it is reassuring to find others point to similar paradigmatic weaknesses. Such insights need to somehow leave the Philosophy/Psychology literature and make a long lasting impact in the CVPR/NIPS/ICCV/ECCV/ICML communities. The problem is that too many researchers/hackers actually building vision systems and teaching Computer Vision courses have no clue who Wittgenstein is and that they can gain invaluabe insights from Philosophy and Psychology alike. Computer Vision is simply not lacking computational methods, it is gaining critical insights that cannot be found inside an Emacs buffer. In order to advance the field, one needs to: read, write, philosophize, as well as mathematize, exercise, diversify, be a hacker, be a speaker, be one with the terminal, be one with prose, be a teacher, always a student, a master of all trades; or simply put, be a Computer Vision Jedi.