Tombone's Computer Vision Blog: January 2006

Sunday, January 29, 2006

Philosophy of Robotics Reading Group

A fellow roboticist, Geoff, recently started the Philosophy of Robotics Reading Group and we had a great first meeting last Friday. We talked about robot emotions, and if anybody wants to see the papers that we discussed then please visit the reading group page.

I will one day present something about the current state and future of computer vision and how it relates to realist and pragmatist philosophy. In particular, I will talk about 'correspondence to reality' and how current work in computer vision poses such objective functions that are consistent with the realist paradigm. On the other side, a pragmatist view of vision would be less concerned with correspondence to anything (what is really out there?) , and more concerned with completion of a vision task.

From an evolutionary point of view, there is no reason for our internal representation to correspond to anything 'out there'; however, from an intelligent design point of view (which I do not advocate) then it would seem appropriate to create a being that can get at something out there.

Saturday, January 21, 2006

pic of my brother from winter indoctrination 2006

Can you find Matt in this picture? Well, you might not be able to, but I surely can.

Wednesday, January 18, 2006

human retinal system

Today in class (advanced perception), we talked about the human retinal system and the various nerve cells that the signal passes through. It appears that there are many nervous arrangements that pick up 'corners.' The reason why this is important is that these corners are very discriminative; humans can recognize many things from line drawings.

You could imagine a Martian who wants to study the human brain. An analysis of the human retinal system would reveal that we are geared for life on earth (certain arrangements in the retinal system respond to certain visual structures), and the Martian could obtain invaluable details about the world on Earth by studying the human's brain. Of course, the Martian would not know what a particular human saw, he would see what humans as a species (a point in some infinite dimensional space that evolution is traversing) are tuned in to see. He would probably have some knowledge of the distribution of spatial objects that we -- humans -- normally see and have some idea of the density of objects in the human-scale visual world on Earth.

On another note, I googled the term "human retinal system matched filter" and found a paper that was written by Michal Sofka and Charles Stewart about extracting vasculature using a set of matched filters; and even though I meant to find a web page which talked about how certain structures in the eye are tuned in to respond to certina visual stimuli, I found a paper on a different topic which nevertheless mentions my name in the acknowledgements section! (They were using some of the generalized vesel tracing implementation I wrote while researching at RPI.)

Monday, January 16, 2006

compression as understanding

compression:

Instead of thinking of compression as something that is used to reduce the size of data, consider it as a measure of understanding data that generalizes well to unseen data. One should view compression as understanding.

Imagine that you walk to class using your normal route while listening to your ipod. Even though you were aware of your surroundings while walking, it was most likely an indirect experience where you remember only subtle little details related to the walk. However, you can still be 100% sure that you had taken the same path as last time. While you were walking, your brain 'understood' the environment dynamically and it only needed a low bit stream (dynamically compressed) of visual information to localize you. This of this notion of compression as model fitting, where the objects of perception are the model parameters and the raw data is inaccesible.

Even though your experience of walking lasted 20 minutes, you feel like you didn't acquire much experience as opposed to spending 20 minutes in a completely new place. Your brain selectively took in information; you might have remembered seeing somebody you recognize drive by but forgot some of the songs that you listened to.

The notion of "an object recognition system" using a "segmentation algorithm" is the traditional definition of segmentation as a mid-level process and object recognition as a high-level process. However, you can't really segment until you've recognized. Recognition and segmentation should be viewed on an equal footing; unified.

metaphysics: looking through a window of unsupervision

The role of metaphysics in the field of computer vision cannot be forgotten. Today I sat in my first class of the Spring 2006 semester. I'm very excited about this course, titled Advanced Robot Perception (or Advanced Machine Perception, but everybody calls it Advanced Perception anyways) which is being taught by my research advisor, Alexei Efros.

While sitting in class and listening to Alexei talk, I remembered my first day of 11th-grad high school honors english. On that first day of english class, the teacher had placed a quote on the blackboard which said, "The window through which we peer circumscribes the world we see." In some sense, this quotation represents my internal philosophy wery well. As a pragmatist -- one who's scientific outlook on life has been shaped by philosophies of Kuhn, Popper, and Rorty -- I hold on to a somewhat wishy-washy concept of truth. Perhaps I started reading Descartes when I was a bit too young, but I'm simply a sucker for Cartesian hyberbolic doubt. Perhaps I don't agree with the common man's world-view, perhaps I doubt the existence of the world outside of my head, perhaps I'm simply not willing to tell my vision system what the world is made up of.

My fascination with unsupervised techniques in computer vision is directly related to my pragmatic philosophy. Perhaps there is nothing wrong with using the objects that we have words for when we (humans) communicate, but I'm just skeptical of the fact that these high-level objects are the objects that we directly perceive. If we want to ask a vision system something about the visual world using natural language, then the vision system will clearly have to translate the english-word concept to its own internal representation of the objects it sees. However, if we want to build vision systems that can interact with the world on their own and there is no need to directly 'talk to the machines' using natural language, then why should we impose an internal structure on their internal representation. Why should we impose a 1:1 correspondence between the objects of language and the objects of perception?

Clearly, a hierarchical representation of objects is necessary since a linear structure simply does not scale to the large number of objects present in the world. I'm currently thinking about image-level primitives that could be used to construct such a hierarchical representation.

Wikipedia says it nicely, "A major concern of metaphysics is a study of the thought process itself: how we perceive, how we reason, how we communicate, how we speculate, and so on." I want to build robotic metaphysicians so that I can ask them 'what is the meaning of life?' Even though I know that the answer will be 42, I think it will be a fun journey.

Sunday, January 15, 2006

transcending scale and gravity learning

A machine which has primitive 'in-the-moment' direct perception of the world is one that solved the small spatio-temporal scale correspondence problem. Here, the time dimension is represented with a time-ordered sequence of images, and spatial scale refers to the distance between objects (in 3-space and/or in image-space). Being able to group together similar pixels and performing a mid-level segmentation of 1 image is the single image segmentation problem. One could imagine using a small temporal scale that corresponds to a negligible view-point variation, and registering the superpixels in that sequence. However, the ability to register superpixels across small temporal scale, aka direct perception, doesn't solve the problem of vision in its entirety.

The ability to transcend spatio-temporal scale and register objects across all of time as opposed to a small temporal window is necessary in order to have true image understanding. You can think of direct perception as the process of staring at something and not thinking about anything but the image that you see. In some sense, direct perception is not even possible for humans. However, indirect perception is the key to vision. Imagine sitting on your couch and typing on your laptop, while staring at the laptop screen. If you're in a familiar location, then you don't have to really look around too much since you know how everything is arranged; in some sense you have such a high prior on what you expect to see, that you need minimal image data (only what you see on the fringes of your vision as you stare at the monitor) to understand the world around you. Or imagine walking down a street and closing your eyes for two seconds; you can almost 'see' the world around you, yet your eyes are closed. These examples show that there is a model of the world inside of us that we can be directly aware of even when our eyes are closed. I would be willing to bet that after training on high quality image data, a real-time system would be able to understand the world with an extremely low-quality camera.

On another note, I would like to build computer vision systems that can infer fundamental physical relationships relating to observed objects such as the law of gravity. Such physical relations will come out if they help 'compress' image data; and they always do. The reason why the concept of gravity compresses image data is that it places a strong prior on the relative locations of familiar objects. For example, the conditional probability density function over the rigid-body configuration of a vehicle given the configuration of a road is significantly lower dimensional than the marginal probability density of the vehicle configuration. In lamens terms, if you know where a road is in an image then you can be pretty sure where you are going to find the cars in the image and if you don't know the location of the road then the cars could be located anywhere in the image plane. If we want intelligent agents to see the physical world around them, we have to remember that they will only be able to understand the large amount of visual data that we give them if they can compress it. In this context, compressing image data is equivalent to performing object recognition on the image. Compression will not only occur at the object-level, but also at the world-level. Object-level compression entails understanding hierarchies of objects (such as a ford is a car) while world-level compression entails understanding the physical relationships between objects in the world. Object-level compression is important if we want to understand all of the different objects in the world, and world-level compression is necessary if we ever want to 'understand' it in a reasonable amount of time. Object-level compression is also related to the concept of meta-objects and the question of object generalization. World-level compression is related to physics and metaphysics.

Thursday, January 12, 2006

recognition as segmentation across time

Let me start out my brief discussion with a segmentation result:

This is the image the New Year's Party picture that I posted a few posts ago. This is the result of Felzenszwalb's graph based segmentation executable. I ran the executable many times by uniformly sampling the input parameter space and selected one image which looked particularly nice.

Segmentation is generally referred as a mid-level process, ie a process which groups together similar regions. However, it is still not a high-level process because it doesn't use any information from other images. When the concept of segmentation was introduced into the vision community, researchers thought that it would a good pre-processing step that would further aid object recognition. However, the community quickly realized that an object-consistent segmentation is only possible after the objects have been identified in the image!

Image segmentation is still very popular as of 2006; however, the traditional definition of segmentation as something you do before recognition is slowly becoming outdated. Modern research on segmentation has a significant object-recognition feel to it, and one interesting question that remains is: how does one incorporate information from a large set of images to segment 1 image?

Every problem in computer vision can be solved with an object recognition module, unfortunately recognition is the most difficult. Computer vision is not simply image processing. Computer vision strives to build machines that can make synthetic a posteriori statements about the world. If Emmanuel Kant was alive this day, he would be a vision hacker.

Wednesday, January 11, 2006

we, the lunatics, and the two Smiths

I would like to briefly comment on the dystopia presented in Orwell's 1984 and its relationship to Heinlein's utopia from Stranger in a Strange Land. Winston Smith's world -- the world of 1984 -- is a world where the Party has determined how man should live his or her life. In this cold and lonesome world, man has been taught to suppress his or her emotions/instincts/desires, cast doubt on their senses, and love their patron -- Big Brother. Characteristic of this dystopia is the mechanical progression of every man's daily life; the irrational and unpredictable elements of modern human life have deemed as crimes. It is important to note that the world of 1984, as presented in the first half of the book, is the dystopia while the hypothetical world of hope and freedom that exists in Winston's mind is the utopia. Orwell presents a world where many people are aware of some shadow lurking in the background -- something wrong with the current way of life -- but they are simply too weak to do anything about it on their own.

The world of Michael Valentine Smith -- the Man from Mars -- is so magical that is transcends the word utopia. By introducing elements from science fiction, Heilnein carefully presents the reader with a world of unfettered emotion, a world introduced to humanity by the man from Mars. In Stranger from a Strange Land, the utopian state is induced by Smith's way of life and embodied by the inhabitants of the Nest (read the book to find out more about this). This utopia (by introducing elements from fiction, Heinlein makes this an extravagantly exaggerated utopia) can be compared to the human way of life the way it was before Michael Valentine Smith arrived. All that is prohibited in the world of Big Brother is encouraged in the world of Michael Valentine Smith. By presenting such a utopian way of life, perhaps Heinlein wanted us to focus on the world the way it was before it was touched by Michael Valentine Smith. Although Heinlein portrays the future as very similar to the modern world as of 2006, it nevertheless portrays it as a sort of dystopia. The big difference between the world of Stranger in a Strange Land (before the Man from Mars came) and the world of 1984 is that Oceania's denizens are somewhat aware of their predicament while the citizens of Heinlein's world only become aware of their situation after their savior comes.

Just how different is the world of 2006 compared to the world of 1984? Does one have to be aware of their predicament for the predicament to truly exist?

Monday, January 09, 2006

back in style

I'm back in Pittsburgh, reading 1984, arpeggiating 7th cords, and getting ready for classes to start.

Thursday, January 05, 2006

four books and a drive with the new deal

I recently finished Stranger In a Strange Land and I will be leaving Long Island with four new books tomorrow. I have recently purchased Tom Wolfe's I Am Charlotte Simmons, Neal Stephenson's Quicksilver, George Orwell's Nineteen Eighty-Four, and I acquired Dune by Frank Herbert as a gift. I'm not sure what to read next, but I'll gladly take any recommendations.

I will be driving from Long Island to Pittsburgh tomorrow after I drop off my brother at Maritime College. Today I downloaded The New Deal's New Year's show at B.B. King's and I look forward to listening to that on the way. I should be blogging from Shadyside shortly.

Tuesday, January 03, 2006

Stranger: done!

Stranger In a Strange Land by Robert A. Heinlein isn't your stereotypical science fiction novel. It is a novel that leaves the reader thinking about love, their own life, and ethics; therefore, I would only recommend it for the philosophically inclined.

Although under the most straightforward interpretation of the title there is nothing ambiguous about the words 'stranger' and 'strange', there is an alternate interpretation that I would like to proffer. Under the straightforward interpretation, the stranger is Valentine Michael Smith -- the man from Mars -- and the strange land is earth.

Alternatively, one can interpret the word 'stranger' in the title of this novel to relate to any human living on earth who is not accustomed to the Martian way of life, an unenlightened individual. The 'strange land' would then refer to the very same Earth -- a very strange place for the unenlightened human. Under this interpretation, I am using the "One who is unaccustomed to or unacquainted with something specified; a novice." definition of 'stranger' and the "Not previously known; unfamiliar." definition of 'strange'. These definitions were taken from the 'stranger' entry on dictionary.com.

This novel does a good job at showing how non-Martian lifestyle-following, earth-dwelling humans are novices in the game of life. The stranger is the human who has grown old and weary in his ways by adhering to the tenets of modern society. The novel shows how such a non-Martian lifestyle leaves one unfamiliar with respect to the taste of reality. The man from Mars handed out a taste of Utopia and the strangers were the humans who haven't yet shared water and grown closer. Perhaps Heinlein wanted to convey the idea that the strangeness of the world around us is intimately related to our world view; Mike wanted to make the world less strange for the denizens of Earth.

Ahead of its time (this book was released in the early 60s and not the early 70s) and promoting a pantheistic view that emphasizes personal responsibility in our own lives, this book is not only entertaining but it will leave the reading asking questions.

Experience More in 2006

I've always been troubled by the concept of a New Year's Resolution. There's nothing special about the start of a new calendar year that warrants a change in one's life. I'm not saying that people are wrong in wanting to do something such as 'exercise more' or 'quit smoking'; however, they should do those things for a reason that transcends the progression of the calendar year.

A good resolution is to Experience More. Simply put, Experiencing More translates to looking past the barriers that have been shaped around us and mandate our daily experience. Experiencing More is grokking that there is more to life than the predictable routine of daily life; it is an attempt to spice up the story of our life. The routine of modern life acts as a strong force that binds us to a particular beat of life and the first step in Experiencing More is understanding what aspects of our life are too predictable. To become a navigator of the space of all beats one has to only perturb their daily routine and be aware of the world around them as everything will take care of itself.

Aquafraternally yours,
Tomasz

Monday, January 02, 2006

new years fiesta picture

Here is a picture of Cole, Dan, Me, and Roseann from the New Year's Party I attended.