Tombone's Computer Vision Blog: September 2005

Thursday, September 29, 2005

reproducing kernel hilbert space

After a talk by Jean-Francois yesterday I decided to learn a little bit about RKHS: Reproducing Kernel Hilbert Space. So what's this new vector space that everybody is talking about?

The RKHS is easily grasped when one draws the analogy betwen role of the Dirac Delta in L2 and the reproducing kernel in a RKHS. The problem is that the dirac delta isn't in L2. Because of this property, the reproducing kernel of L2 isn't in L2. Due to the insanely small support size of the dirac delta, some non-smooth functions are in L2. In fact, we need two different notions of convergence for L2. We need convergence in the mean and pointwise convergence.

In an RKHS, convergence in the mean implies pointwise convergence. In an RKHS, the reproducing kernel usually has some support and therefore only smooth functions lie in this space. The reproducing kernel of an RKHS is actually in the RKHS! Great!

Did I get that right? I have to read Michael Jordan's notes again.

Tuesday, September 27, 2005

its all about the Bayes Nets

Currently thinking about: Bayesian Networks.
Currently reading: Michael Jordan, Andrew Ng, Chris Bishop, Kevin C. Murphy.

Thursday, September 22, 2005

The soul and the extended phenotype

I had an interesting thought today when I was thinking about the 'soul' and machine learning. Today's epiphany revolves around two key ideas:
a.) why so many people in AI/Robotics are obsesed with Machine Learning (experience at CMU)
b.) how a particular genotype has broad impact on the rest of the world (Dawkins' Extended Phenotype)

Let's start with the evolutionary theory first. According to Richard Dawkins, the term 'extended phenotype' refers to the influence of a gene beyond the organism which is a container for the gene. Dawkins says that we have to look beyond the effects of a gene on the organism which serves as a container for the gene; we have to look at the cases where the gene influences the survival of an organism by its broad-reaching influence. In The Selfish Gene, Dawkins discusses interesting cases where a virus (with its own genes) will infect an organism and through its own extended phenotype it will help the the organism survive. This is a case where the virus helps the organism and the organism helps the virus.

Imagine this process of mutual symbiosis has been happening over millions of years as species evolved. In the same sense that Dawkins uses the term 'extended phenotype' to denote that the gene has a long reach outwards into the world, we can also look at the long reach of the world inwards. If the outside environment is made up of organisms which possess their own genes, then just as much as we are influencing them they are also influencing us. Over the insanely long amount of time that we have been evolving, the outside world has touched us deeply. We have acquired new genes simply because we are creatures which interact with the world (and the world interacts with us). There is a part of the outside world 'inside' us. This intimate relationship we have with the environment is a result of us evolving with the world. This pantheistic view that a part of the world is inside of us is what gives us our 'soul.'

Now I have something to say about Machine Learning. ML refers to algorithms that change their internal state once they observe some data. This is analogous to the process of the world becoming a part of us throughout the process of evolution. Machine Learning is concerned with algorithms that are trying really hard at building up a soul.

These esoteric ideas can be rendered pellucid via the artificial neural network analogy. An ANN contains hidden nodes whose weights are updated when new input/output pairs are presented. These weights are actually dependent on the input/output pairs. Sometimes these weights correspond to latent variables (hidden states) of the world, but it is only important to realize that these weights are highly correlated with the types of input/output pairs that have been used to update them. Consider person 'A' who spent their entire life in NYC (they were looking at buildings and crowded city scenes their entire life), and person 'B' who spent their entire life in the Sahara Desert (they were looking at sand dunes all of their life). Clearly, person 'B' will have a hard time getting their way around a metropolitan area while person 'A' will struggle at finding his/her way in any type of desert. This is because the spatio-temporal patterns that they have been accustomed to seeing have been engraved in their hidden weights. NYC is in some sense 'inside' of person 'A' while the Sahara is inside of person 'B'.

Sunday, September 18, 2005

google maps Shadyside jogging path

Here is a google maps plan of my Shadyside jogging path. I ran this route yesterday in about 35 minutes.

bidirectional search: looking for Marr

Unifying ↓↑ (bottom-up) and ↑↓ (top-down) approaches to computer vision reminds me of bidirectional search (an algorithm that is generally taught in introductory Articial Intelligence courses).

I need to find a copy of David Marr's book Vision. Or perhaps this book is already looking for me.

Wednesday, September 14, 2005

research advisor

Today I was officially bound to my research advisor, who obtained his PhD in 2003. The other person that I was debating working with obtained his PhD in 1974. Youth = passion = new ideas.

I came to CMU wanting to work on high level vision tasks such as object detection/recognition and machine learning, and now I'm doing it. I should start planning for the future; I'll be out of here in no time.

I have to start working on my new research webpage at CMU. I will then update my new research directions.

Tuesday, September 13, 2005

I love to approximate {functions,data} with other {functions,kernels}

Today was the first class of Machine Learning. This first-rate course is being taught by renown ML researchers Tom Mitchell and Andrew Moore. In fact, the primary text for the course was written in 1997 by Tom Mitchell.

Camera Calibration = unexciting
Parzen Density Estimation = exciting!

If you find yourself bored one day, take a look at the online book titled Linear Methods of Applied Mathematics: Orthogonal series, boundary-value problems, and integral operators. It's delicious.

Monday, September 12, 2005

Vision is not inverse optics

While thinking about the microstructure of rough materials and microfacet lighting models such as the Torrence-Sparrow or Oren-Nayar models, I came to the hypothetical epiphany that vision is not inverse optics.

I should clarify. There are two types of vision, namely computational human vision and computational extraterrestrial vision. Computational human vision is concerned with high level vision tasks such as object detection, object learning/discovery, and overall scene understanding. Computational extraterrestrial vision is concerned with understanding how light interacts with matter and how we can infer low level properties of substances given their images. We should look at this computational extraterrestrial vision goal as something that would help scientists see things that the naked eye cannot see. However, I vehemently protest the idea that we need anything like a Oren-Nayar lighting model to be able to do object detection in the way humans can do it.

When I was younger (4 years ago) I wanted to be a theoretical physicist. Back then I envisioned that in graduate school I would be writing simulations of quantum chromodynamics. I thought that by starting with small things (gluons, quarks, photons, electrons) I could one day help put together all of the pieces scientists have been collecting over the years. However, I have abandoned this goal of understanding the world via physics. I have little faith in the bottom-up approach to modeling reality.

I believe that by studying computational human vision, I am following the Top-Down approach to modeling reality. For a long time I've had this vision of a new quantum mechanics, a new physics where the indivisble units are 'cats' and 'trees' and 'cars,' namely the indivisible units of human experience.

I used to Aeolian, now I'm Dorian

My musical style has slightly changed as I now focus more on changing my modes while I play guitar instead of always going back to the Aeolian minor mode. I'm still playing the same notes, but I tend to focus a lot more on the minor7,major7,7 trio.

A few years back I was an Eminor-->Dmajor type of player. Then a few months I started dabbling in minor7 and major7 chords. I find myself often playing the following pattern that I found on wholenote (like I Will Survive by Cake):

Am7 Dm7 G7 Cmaj7 Fmaj7 Bbmaj7 Esus4 E

I tend to play a lot more Dorian than Aeolian these days.

Saturday, September 10, 2005

A Race and A Purchase

Yesterday I participated in the Carnegie Mellon SCS Pretty Good Race. I placed 20th out of roughly 47 contestants. For about a half of a mile (the last stretch) I was a few feet behind some girl, but at the end I had enough juice left to sprint and pass her. Little did I know that she was the first girl to finish (6 seconds behind me).

I bought some new stylish running gear today at Dick's sporting goods store. Most importantly I bought new running shoes, namely Asics GT-2100 shoes. The box was improperly marked, so I only paid $59.99 instead of the normal $79.99 discount price ($89.99 when not on sale). I also bought a headband (to stop the nasty hair gel/sweat mixture from hurting my eyes while I run) and some expensive running shorts made out of some special fabric. Later that day I tested out my new gear. :-)

Wednesday, September 07, 2005

seeing with our feet and hands: quantum mechanics for you

Can we see without moving around with our legs (the things responsible for allowing us to change our viewpoint with respect to a stationary object)? Can we see without our hands (which allow us to manipulate objects as to change our viewpoint)?

In the context of a computational theory of vision, can we truly expect an algorithm to understand what objects are if we keep feeding it images, never letting it explore the world? I've been mentally preparing myself for Alva Noe's book (see last post) by tring to think about what he is about to tell me. Can we have perception without action?

Then again, what do I know? I know that vision research has been stagnating for the past few decades. Why would I care what a philosopher at Berkeley has to say? Why not read vision papers? The answer not clear, but the expression that comes to mind is Kuhn's "paradigm shift." Something tells me that physics and philosophy are going to be a big parts of my future research. Unfortunately (fortunately perhaps) I will be forced to interact with the mainstream vision community.

Adieu

Tuesday, September 06, 2005

two books and a signature

After an oral examination with the professor, he decided that I was qualified to waive the Math Fundamentals for Robotics course. I did study. He could probably tell when he asked me "Do you know what the Calculus of Variations is?" and I replied, "Would you like to see the derivation or the Euler Lagrange equation?"

I went to the library and got two books. The first one is Shimon Edelman's Representation and Recognition in Vision. The second book is Alva Noe's Action in Perception. I was aroused by this book after reading Edelman's short reply to Action in Perception; this paper can be found here.

To quote Noe, "The main idea of this book is that perception isn't something that happens inside us (in our brains say). It is something we do." I feel that to push the field of computer vision to the next level, I must know what these philosophers are up to. A long time ago I could have been found in a philosophy class arguing about something pointless, but I shortly abandoned my futile project to fully dedicate my time to physics and computer science. After the code and the calculus, I believe that I have reached a level of maturity which allows me to revisit philosophy.

Monday, September 05, 2005

schooling algorithms: bringing babies to school

It appears that Google might not be the best source of data for training a machine vision system (a baby) in the early stages. I will use the word baby to loosely refer to the current state of the art in machine vision. Before a teenage vision algorithm can learn to navigate the nasty world on its own, it must learn the basics of vision and object classification in elementary school.

What I'm trying to say is that there is a time for everything; there is a time for unsupervised learning. Try using Google images to search for images of 'shoe' and you will find the third image a 10 foot high-heeled shoe. A human will understand that this is still a shoe even though its scale is out of whack. When trying to teach a baby what the word 'shoe' means, it is a bad idea to show it 10 fooot high statues of shoes.

By the way, Google images returns too many synthetic and manually edited images of objects. These unrealistic scenarios are not good for training babies.

Saturday, September 03, 2005

Investing in America and Donating to Red Cross

I just donated 25$ to the Red Cross because of the Katrina debacle. I don't think 25$ is a lot, but I think that if a grad student living from a stipend can afford 25$ then affluent Americans can afford to donate a few more dollars. The funds will add up, I have faith in multiplication.

It's not like I'm simply giving away my money, I'm investing in America. I know there will be a time when some other part of America (perhaps the part where I reside) will require help, and I hope a similar mentality will drive some other graduate student on the other side of the US to donate his 25$ (and perhaps help me eat).

My friend Mark recently put up this http://robogradshelp.blogspot.com/ for robograds to donate to the Red Cross. This site is merely a gateway to the official Red Cross site. So far we have a few 1st year robograds who donated 25$.

Thursday, September 01, 2005

python and google

I want to one day work for Google. And I want to write Python code while I'm there. It can happen.

I've been thinking about what I want my PhD to give me. Freedom is what I want. I don't want to work for anybody. We'll see how that goes.