Wednesday, May 31, 2006

Two weeks in Poland

Currently I'm packing for my two week trip to Poland. After this little adventure, I'll be going to CVPR in NYC. By the end of June I'll be back in Shadyside, PA.

Friday, May 12, 2006

the killer app of computer vision

What is the killer application of computer vision? In other words, how useful are machines that can visually detect objects in images?

The easiest application to think of is image retrieval. For this application a user specifies either an image or some text, and the system returns new images that are somehow related to the input. In addition, the resulting images also come with some type of information that relates them to the input. Surely companies like Google would be interested in such applications, but isn't there more that we could get out of computer vision?

When I was younger I was very interested in particle physics, and I even finished my undergrad with a dual degree in Computer Science and Physics. I was impressed with the way that computational techniques could be used to 'get at' the world. Large-scale simulations and data analysis could be used to infer the structure of the world (or at least given some structure to fit the necessary parameters).

Could we train machines that can infer relationships between objects in the world? Can a machine infer Newtonian-like properties (and thus establish a metaphysics) of the world such as mass and gravity from visual observations? I think the big questions here is the folllowing: can we train machines to 'see' objects without those machines first understanding any properties of the dynamic world? When I say 'properties of the dynamic world,' I do not mean appearance variations, but things like 'objects have mass and things with mass do not just float in ambient space,' and 'things in motion tend to stay in motion.'

Sunday, May 07, 2006

first year of graduate school: lessons learned

A few days ago I went to the final class of my first year of graduate school. I just finished (well a few things are still due; however, I have no more classes) the first year of the Robotics PhD program at Carnegie Mellon University.

My first semester I took Appearance Modeling (graphics/vision course) and Machine Learning. This past semester I took Advanced Perception (aka vision 2) and Kinematics, Dynamics &Control (robot/physics course). In addition to these courses, I have been regularly attending CMU's Computer Vision misc-read reading group. I have also had the opportunity to collaborate with many other graduate students on course projects.

The course projects that I worked on are:

Demultiplexing Interreflections with Jean-Francois
Modeling Text Corpora with Latent Dirichlet Allocation with Jon
Learning to Walk without a Leash with Geoff and Mark
Detecting Objects with Multiple Segmentations and Latent Dirichlet Allocation with Jon

I probably learned the most amount of new concepts from the Machine Learning community. Graphical models are definitely very trendy in Computer Vision in 2006. Almost everyone wants to be Bayesian about random variables. After this first year of graduate school, I've expanded my vocabulary to include terms from topics such as: SVMs, kernel methods, spectral clustering, manifold learning, graphical models, texton-based texture modeling, boosting, MCMC, EM, density estimation, pLSA, variational inference, gibbs sampling, RKHS...

However, the one key piece of advice that I keep hearing over and over is the following:
Do not blindly throw Machine Learning algorithms at a vision problem in order to beat the performance of an existing algorithm.

I don't think I'll stray away from Machine Learning; however, it is very important to understand what a Machine Learning algorithm is doing and when it works/fails. On another note, next semester I'm taking Carlos Guestrin's Probabilistic Graphical Models course.