Tombone's Computer Vision Blog: "I shot the cat with my proton gun."

Thursday, March 08, 2012

"I shot the cat with my proton gun."

I often listen to lectures and audiobooks when I drive more than 2 hours because I don't always have the privilege of enjoying a good conversation with a passenger. Recently I was listening to some philosophy of science podcasts on my iPhone while driving from Boston to New York when the following sentence popped into my head:

"I shot the cat with my proton gun."

I had just listened to three separate Podcasts (one about Kant, one about Wittgenstein and one about Popper) when the sentence came to my mind. What is so interesting about this sentence is that while it is effortless to grasp, it uses two different types of concepts in a single sentence, a "proton gun" and a "cat." It is a perfectly normal sentence, and the above illustration describes the sentence fairly well (photo credits to http://afashionloaf.blogspot.com/2010/03/cat-nap-mares.html for the kitty, and http://www.colemanzone.com/ for the proton gun).

Cat == an "everyday" empirical concept

"Cat" is an everyday "empirical" concept, a concept with which most people have first hand experience (i.e., empirical knowledge). It is commonly believed that such everyday concepts are acquired by children at a young age -- it is an exemple of a basic level concept which people like Immanuel Kant and Ludwig Wittgenstein discuss at great length. We do not need a theory of cats for the idea of a cat to stick.

Image from shadowpaw99

Proton Gun == a "scientific" theoretical concept

On the other extreme is the "proton gun." It is an example of a theoretical concept -- a type of concept which rests upon classroom (i.e., "scientific") knowledge. The idea of a proton gun is akin to the idea of Pluto, an esophagus or cancer -- we do not directly observe such entities, we learn about them from books and by seeing illustrations such as the one below. Such theoretical constructs are the the entities which Karl Popper and the Logical Positivists would often discuss.

While many of us have never seen a proton (nor a proton gun), it is a perfectly valid concept to invoke in my sentence. If you have a scientific background, then you have probably seen so many artistic renditions of protons (see Figure below) and spent so many endless nights studying for chemistry and physics exams, that the word proton conjures a mental image. It is hard for me to thing of entities which trigger mental imagery as non-empirical.

How do we learn such concepts? The proton gun comes from scientific education! The cat comes from experience! But since the origins of the concept "proton" and the concept "cat" are so disjoint, our (human) mind/brain must be more-amazing-than-previously-thought because we have no problem mixing such concepts in a single clause. It does not feel like these two different types of concepts are stored in different parts of the brain.

The idea which I would like you, the reader, to entertain over the next minute or so is the following:

Perhaps the line between ordinary "empirical" concepts and complex "theoretical" concepts is an imaginary boundary -- a boundary which has done more harm than good.

One useful thing I learned from Philosophy of Science, is that it is worthwhile to doubt the existence of theoretical entities. Not for iconoclastic ideals, but for the advancement of science! Descartes' hyperbolic doubt is not dead. Another useful thing to keep in mind is Wittgenstein's Philosophical Investigations and his account of the acquisition of knowledge. Wittgenstein argued elegantly that "everyday" concepts are far from "easy-to-define." (see his family resemblances argument and the argument on defining a "game.") Kant, with his transcendental aesthetic, has taught me to question a hardcore empiricist account of knowledge.

So then, as good cognitive scientists, researchers, and pioneers in artificial intelligence, we must also doubt the rigidity of those everyday concepts which appear to us so ordinary. If we want to build intelligent machines, then we must be ready to break down own understanding of reality, and not be afraid to questions things which appear unquestionable.

In conclusion, if you find popular culture reference more palatable than my philosophical pseudo-science mumbo-jumbo, then let me leave you with two inspirational quotes. First, let's not forget Pink Floyd's lyrics which argued against the rigidity of formal education: "We don't need no education, We don't need no thought control." And the second, a misunderstood, yet witty aphorism which comes to us from Dr. Timothy Leary reminds us that there is a time for education and there is a time for reflection. In his own words: "Turn on, tune in, drop out."

17 comments:

Srout12:28 PM
I can't see quite how this fits in to anything, but when I read " It is commonly believed that such everyday concepts are acquired by children at a young age " I was reminded of some of my first year psych when we studied human learning. When children are learning language, they make two very common mistakes: over-generalization, and over-specification.

While a child may recognize that a furry creature with four legs and triangular ears is a cat, they may also apply that label to squirrels, terriers and skunks. They may also only apply the term cat to their family pet, and not know what to call other kitties.

So, I guess what I'm trying to say is that even very simple concepts like cat do take quite a while to sink in! We just don't remember the learning process. :3
ReplyDelete
Replies
helech ayef1:17 PM
I think that what we call the 'cat' concept is actually a mixture of several related concepts.
There is the visual 'concept' or 'model' which allows us to classify cats from non cats when we see an object. This model can exist even without us ever expressing explicitly or learning explicitly what a cat is (learning from examples).
Another model or concept is the dictionary definition of cat.
An interesting distinction between these two is that the first cannot be communicated directly (only indirectly by examples, based on which the visual model is built internally), while the second is to my opinion used mainly for communication purposes.
There is obviously some relation between the two, but I suppose these are stored in distinct areas in our brain, each encoded separately.
Of course there is also the vocal model of a cat etc - which I think can be also split into several models - the word meow is related to one (the dictionary definition one) and the other is the actual sound our cat makes.
ReplyDelete
Replies
Tomasz Malisiewicz4:16 PM
Hey Sam and Helech,

I agree with both of you, the simplicity with which we use the concept "cat" in our daily life is not likely to mean that the concept "cat" is simply stored in our brains/minds. But not enough people really believe this, as they try to extract meanings out of pixels directly.

--Tomasz
ReplyDelete
Replies
Unknown11:14 PM
I think it is quite easy to show that "ideas from experience" and "ideas from education"; or ordinary vs complex ideas are not different.

Myself, I have no real idea of what a proton gun is; other than a vague notion of what a proton is and what a gun is; so its presumably something that shoots protons.

But before too long in thinking about a cat, its no different. I can recognize a cat as "a cat" because that seems to be what we call those things that meow; in the same way I can recognize a proton gun because you labeled it unambiguosly in your picture. But the implementation details are probably even more fuzzy to me than a proton gun; and in that sense my "idea of a cat" doesn't really map to a cat at all; because if you get right down to it I don't even know what a cat is.

But for the sake of argument lets assume that a proton gun is one type of idea and a cat is another type of idea. What type of idea is the cat I heard about from my friend that I haven't directly experienced. Or what about the idea of "cats in general that belong to my friends that I have not yet met".

It seems too easy to start with an idea that we decide is "From experience" and make incremental changes so that it is eventually "theoritical" (ship of theseus style).

I would posit instead that we simply have different inputs, some louder than others and all of these inputs are smooshed together on the lense of conciousness. Our mental processing is a feedback loop and can "shine on its own lense" so that I can conceive of a two horned unicorn and hold it in my head exactly the same way as a horse i rode once.

What some people are calling "theoretical" concepts are just inputs from mental processes. I think though that it turns out that even the idea of "cat" is virtually entirely made up of inputs from mental processes.

Even if I am *right this moment* watching a blue ball roll down a ramp; my concept of that ball and it's rolling is so caught up in all the other things that even in the very moment of direct experience my concept of 'ball' is inseperable from a 'theoretical' concept.

I think it's akin to the way a kinect works as opposed to a camera as opposed to actual photons; but in many different senses instead of just vision. Our minds at the concious level where we have things like ideas about cats are not getting "raw" data about cats. It has already gone through layers of abstraction; and that part that we identify with as "having an idea" is getting highly processed data and actually doesn't even have access to "raw experience".

Bringing it around to computer vision; it seems that we--as in the part of ourselves we subjectively identify as *I*-- are certainly not processing a 2d array of pixels and getting ideas from it, rather the input we are receiving has already been processed and transformed into something we can use.
ReplyDelete
Replies
Tomasz Malisiewicz11:45 AM
Hi unknown,

I appreciate your long comment. You cleverly argue that there is a sort of feedback loop which means we are not merely bystanders when interpreting the world around us. Viewing "theoretical" concepts as ideas whose inputs are mental processes is useful. This reminds me of Philosophy of Science, especially when I learned about theory-laden observations (i.e., the rejection that observations cannot be entirely theory-free).

The problem is that most people have strong realist tendencies which makes them earnestly believe that there is a world "out there" and we merely observe the world and take information in. Following this mentality into computer vision suggests that we build an ontology of the world's concepts, and train machines to recognize those concepts directly from raw sensor data (e.g., 2D arrays of pixels).

But because I embrace my antirealist tendencies (and I think you do too), I earnestly believe the assignment of the world's stuff into categories only happens at interpretation time. Categories and concepts aren't "out there" they are "within us." Maybe it doesn't even make sense to think of an X being a cat when no cognitive agent (embodied with an understanding of cats) is observing X.

It is not at all clear what machines must possess, in order to transform raw sensor data into experience. But I doubt a realist/empiricist account of the world will ever give us intelligent machines. Some type of creative ability, a sort of magical spark, will likely be necessary. It took evolution a long time to produce humans, yet we (the computer scientists, hackers, and philosophers) think that we can crack the puzzle in our lifetime -- seems a bit ridiculous!
ReplyDelete
Replies
Hossein Mobahi3:55 AM
If you are suggesting that a creative scientist must think outside of the box to have a fresh perspective, then I am with you. If you are suggesting that theory is redundant if you have empirical observation, then I disagree.

A theory that has practical implications is worth millions of experiments. With one theorem, you can be definite about when things are going to fail/work, as long as the assumptions are met. With experiments, even if you run millions of them, and they are all positive, it is no proof that you have solved the problem yet.

The trend of research in computer vision is actually getting very disappointing, because hacks and heuristic are replacing academic research. I learn much more from older CVPR papers than the ones published in the past decades. The older ones used to rigorosuly formulate problems and do some analysis... One of the favorite ones is scale space theory. These days, most of the vision papers are no more science, they are just hacks about picking newer features and then throwing and SVM to them. Tweak its parameters and hidden stuff so much as it produces good plots, and you have a vision paper.
ReplyDelete
Replies
Tomasz Malisiewicz2:39 PM
Hi Hossein,

I am definitely suggesting that former (of not being intimidated by a new outlook) and far from the latter (the superiority of empirical observations over theory). I agree with you that the newer trends in CVPR/ICCV/ECCV make it very unexciting to read those papers.

I am suggesting that a little bit of "armchair philosophy" can be very good for a young scientist when they want to tackle a problem as ambitious as object recognition. The fact is that submissions to the computer vision conferences are rising, and most of the research is being done by students with a rigorous training in mathematics, computer science, etc, but with very little training in thinking about "thinking." To make things worse, there is so much focus on experimental validation that almost everyone is all about "beating your curve" and not advancing science and philosophy.

Vision has been in the realm of "armchair philosophy" for millennia, and only in the last several decades can we see serious attempts by both cognitive scientists (who want to study human vision) and computer scientists (who want to build machines that see). But because it is not even clear what sort of vision tasks we should have machines solve, there is a danger is overly emphasizing progress on "standard vision tasks" such as Caltech256 categorization or Pascal VOC object detection.

I think to advance vision we must be more philosophical in our thinking, we must be better historians and truly understand vision paradigms from earlier generations, must become better mathematicians, as well as mature system builders. We also don't want an overly mathematical theory with no experimental results -- I find too many researchers which think that software engineering is "below" them.
ReplyDelete
Replies
Hossein Mobahi6:18 PM
Thomasz,

Thanks for interesting comments. I completely agree with your statement "there is so much focus on experimental validation that almost everyone is all about beating your curve."
I also agree with non-representative property of the current visual learning datasets.

We might not be on the same page on the importance of theory for scientific advancement of the field. However, I think we both believe in, at least, a good balance between theory and experiments, and both admit that the field is too much of biased and limited experiments.

In that direction, I got something that might be interesting to you and others. I recently collected some statistics about the relative weight of theory vs experiment in CVPR and ICCV. The trend, from the begining of these conferences up to now, consistenyly shows that theory is "dying" in computer vision very rapidly. This is a very serious alarm, in my opinion.

I am trying to write a one or two page note on this in a week or so. I will keep you posted once I have it.
ReplyDelete
Replies
mr smex10:54 AM
I've thought quite a bit about this post. I however keep getting stuck on the following comment:

' Categories and concepts aren't "out there" they are "within us." '

Could you explain it more? Perhaps give some type of proof of it? For example how does math fit into this paradigm? Are the theorems of the calculus within us too? I would grant you that of course we apply our own unique logical symbols, and the formulas in some cases could be written differently, that is, we put the calculus into our own human words. But isn't the calculus a recognition of phenomenon completely outside of us?
ReplyDelete
Replies
Tomasz Malisiewicz5:05 PM
Dear Mr. Smex,

The particular phrase which you found worrisome has been a driving force in my own research endeavours and I indirectly attribute it to the heroes of my youth (Rene Descartes, Thomas Kuhn, William James, among others). Before you label me as a lunatic, I should clarify that I made the expression more aggressive in my blog than I would have in any sort of scientific publication. This is one of the great things about blogs, you can be as aggressive as you want.

When writing the blog post I was thinking primarily about visual object categories, because visual object recognition by machine has been the bread and butter of my formal education. You are correct in pointing out that "categories and concepts" do not need to be visual, nor apply to ontologies of visual phenomena. There are certain concepts like mathematics which are not private constructs -- perhaps they exist in a Platonic realm, perhaps they are a product on our human cognitive capacities. If anything, they are "within us" as a society rather than "within us" as individuals. I will not argue against this today -- the way in which the entire world seems to agree on calculus suggests that there is something universal about mathematics, something which transcends personal experience.

While I'm less of an anti-realist when it comes to mathematics, there is a broad class of visual phenomena which transcends rigid categories. Many great examples come from the Eleanor Rosch school of thought (i.e, Rosch and Lakoff). For example, when people were told to categorize curtains as furniture, they would sometimes say yes and sometimes no. The same person, would provide incorrect answers on multiple trials.

Back to the worrisome clause and what it means to me. For me, it is not something which requires explanation nor proof. For many researchers, their primary focus is on new algorithms where the assumptions (i.e., inputs and partitioning of inputs into categories) are inherited from a generate of previous researchers. I think a good scientist should always stand on a strong foundation, and what better way to test your foundations than by questioning them. An anti-realist view suggests that the assumptions of many apply-machine-learning-to-computer-vision problems are also up for debate.

Great books which elucidate this temperament is anything related to William James' pragmatism, Kuhn's Structure of Scientific Revolutions, the easy-going Everything is Misc by Weinberger.

Personally, I hope to only inspire a generation of researchers. Sometimes the goal of science isn't to produce new theories, but to produce new eyes (i.e., novel ways of looking at the same thing). If you feel strongly against anything I said, feel free to point out any flaws/problems.

Happy Hacking,
Tomasz
ReplyDelete
Replies
Tomasz Malisiewicz5:12 PM
Hi Hossein,

I look forward to your analysis about "dying theory" in Computer Vision. And you are probably correct in your statement that we are most likely NOT on the same page when it comes to the interplay between theory and experimentation in Computer Vision. I am not sure exactly what your stance is, but I would to love to hear your perspective on things. If you have something published (paper, blog, scanned notebook page, etc) regarding your scientific philosophy, please direct me in the right direction.

--Tomasz
ReplyDelete
Replies
Hossein Mobahi11:40 PM
Hi Tomasz,

I have not myself written anything yet, but I plan to, at least about that CVPR/ICCV trend. However, I have put two PDF notes written by others that partially similar to my philosophy (specially the one written by Yi Ma). You can find them in the main entry page of my website. You can take a look at them if that interests you. I will keep you posted with my own note very soon :)
ReplyDelete
Replies
helech ayef1:00 PM
I would interpret the sentence:
' Categories and concepts aren't "out there" they are "within us." '

using wittgenstein's "duckrabbit" example:
http://en.wikipedia.org/wiki/Philosophical_Investigations#Seeing_that_vs._seeing_as

human concepts are much more fluid and flexible than what current machine learning models currently offer which is more or less: divide he world into a fixed set of classes, perhaps also offer some prebuilt structure into them (like a hierarchy).
Much of human cognition is related to the context in which objects appear in. The data itself is often much more ambiguous and rich than the superficial labels being attached to it in the dataset's groundtruth, context is often what helps disambiguation but context is anything but objective.
ReplyDelete
Replies
Tomasz Malisiewicz12:20 PM
A lot of my ideas have been directly influenced by Wittgenstein, so it is not a surprise you see the link.
--Tomasz
ReplyDelete
Replies
Mr. Smex11:12 AM
While I greatly admire these questions and questioners, I wonder whether, for the near term advancement of CV (near term being important because most change is incremental) these questions are too transcendent to be of any practical use. All I really want to do is to create machines that see things that I want them to see and be able to move through the world in a way that allows them to obtain their predefined goals (E.G. a robot that farms strawberries.) Your machine may very well categorize a curtain as furniture, while mine may not even recognize it at all. To my mind it is irrelevant as long as our respective machines define objects in the context of their predefined missions.

In either case who is to say what is right and what is wrong! Each of us will hopefully take the discipline a few steps forward in our own way so that someday one lucky fellow will pull it all together in one of those special 'Aha!' moments.
ReplyDelete
Replies
Tomasz Malisiewicz5:18 PM
Dear Mr. Smex,

I, for one, would rather be building intelligent machines than philosophizing about the ineffability of man-made concepts. However, vision is one of those things which just seems easy. I have dedicated a large fraction of my youth to learning the conceptual tools necessary to tackle such grand problems, but I feel there is something about the way we (as a society of visionaries) are structuring the vision problem which is rotten.

It's not like we have already built machines which do a good job at recognizing the ~20 or so easy visual concepts, and the sole remaining problems are just splitting hairs (such as "are curtains furniture?"). Vision just sucks. And I only hope to inspire. My blog will never be a surrogate for a formal and rigorous academic upbringing. You will never learn all you need to know about vision from my blog either. I tend to focus on things which I think the CVPR community is lacking, but are of high interest to many intellectuals out there.

I don't encourage anybody to overly philosophize, nor to spend 10 hours a day in front of a computer. Taste the flavors of different disciplines, read often, write often, and learn to communicate ideas clearly. I agree that most change is incremental, but groundbreaking change is rarely incremental. I hope a little bit of armchair philosophy will help revolutionary ideas emerge, but in the end I'm 100% with you in that actually building machines is really where it's at.
ReplyDelete
Replies
noodles6:07 PM
Here's one for you:

My cat contracted hyposomaticrheumatosis

For the record, hyposomaticrheumatosis is a word I just invented (or at least independently discovered if it existed already :P). What's important though is that you've never learned this word before in any context, but because it contains medical-sounding word-parts and comes after "My cat contracted..." your brain automatically assumes it's some kind of cat disease. To me this says your brain is doing some pretty hefty conversion from what it sees (or hears or reads) to what it actually thinks... which I guess is kind of obvious if you think about it. But it stands in pretty stark contrast to a system that recieves an input and automatically assigns one correct label to it. There's no correct label here, only a plausible one at best.

What's more, I'd be pretty convinced that this conversion process changes a lot from person to person. Consider the possible responses someone might have to hearing about my poor cat:

Oh my dear that's terrible!
Good, I hate your cat.
(Or from a medical professional) You're a liar, that's a made-up word.

The first two may sound like emotional responses that have no bearing on the conversation, but they illustrate how far your brain goes with something once it has heard it. Almost immediately your brain brings in all the associations you have with my cat or cats in general to the situation which causes you to think or feel a certain way about what I just said. Your brain is using my words to paint a picture for your brain to use to make decisions about what to do next. The last one is interesting because it takes this one step further and is already using this information to gauge my future credibility.
ReplyDelete
Replies

Add comment

Thursday, March 08, 2012

"I shot the cat with my proton gun."

17 comments:

Subscribe To