Tuesday, November 04, 2008

Linear Support Vector Machine (SVM) in the Primal

I often solve linear SVMs. These are convex optimization problems, much like logistic regression, where the goal is to find a linear decision boundary between two classes. Unlike logistic regression, only the points near the decision boundary affect its parameters. Data points that are correctly classified by a margin do not influence the decision boundary.

Quite often when SVMs are taught in a Machine Learning course, the dual and kernelization are jumped into very quickly. While the formulation looks nice and elegant, I'm not sure how many students can come home after such a lecture and implement an SVM in a language such as MATLAB on their own.

I have to admit that I never really obtained a good grasp of Support Vector Machines until I sat through through John Lafferty's lectures in 10-702: Statistical Machine Learning which demistyfied them. The main idea was that an SVM is just like logistic regression but with a different loss function -- the hinge loss function.

A very good article written on SVMs, and how they can be efficiently tackled in the primal (which is super-easy to understand) is Olivier Chapelle's Training a Support Vector Machine in the Primal. Chapelle is a big proponent of primal optimization and he has some arguments on why primal approximations are better than dual approximations. On this webpage one can also find MATLAB code for SVMs which is very fast in practice since it uses a second order optimization technique. In order to use such a second order technique, the squared hinge loss is used (see the article for why). I have used this code many times in the past even for linear binary classification problems with over 6 million data points embedded in 7 dimensions.

In fact, this is the code that I have used in the past for distance function learning. So the next time you want to throw something simple and fast at a binary classification problem, check out Chapelle's code.

Thursday, October 30, 2008

creating visualizations via Google Earth and Matlab

I've been recently generating KML files in Matlab for visualizing "stuff" on the Earth's Surface. It is really easy to generate overlays as well as place "placemarks" showing images. Google Earth is an amazing tool which is going to be around for quite some time. Rather than generating static figures, why not let people interact with your figures? Well, Google Earth seems to have taken care of the interactions.

In addition, if you have lots of data you can use the level of detail capabilities of KML to cut your data up into little chunks. Its just like an octree from you computer graphics class. Pretty easy to code up. All your data can be stored on a server so there is no need to have anybody download it. Its possible to make a webserver emit the .kml files properly and not as ascii so that Google Earth can directly open it.

Once I *finalize* some kml files I'll show simple examples of the Level of Detail visualizations.

Sunday, September 21, 2008

6.870 Object Recognition and Scene Understanding

This semester at MIT, Antonio Torralba is teaching 6.870 Object Recognition and Scene Understanding. Topics include object recognition, multiclass categorization, objects in context, internet vision, 3D object models, scene-level analysis, as well as "What happens if we solve object recognition?"

Why should anybody care what courses new faculty are offering, especially if they are being taught at another academic institution? The answer is simple. The new rising stars (a.k.a the new faculty) teach graduate-level courses that reflect ideas which these professors are truly passionate about. Besides the few initial semesters, when new faculty sometimes have to teach introductory level courses, these special topic courses (most often they are grad-level courses) reflect what has been going on in their heads for the past 10 years. Such courses reflect the past decade of research interests (pursued by the new professor) and the material is often presented in such a way that the students will get inspired and have the best opportunity to one day surpass the professor. I'm a big advocate of letting faculty teach their own courses -- of course introductory level undergraduate courses still have to be taught somehow...

A new professor's publication list is a depiction of what kind of research was actually pursued; however, the material comprising a special topic course presents a theme -- a conceptual layout -- which is sometimes a better predictor of where a professor's ideas (and inadvertently the community's) are actually going long-term. If you want to see where Computer Vision is going, just see what new faculty are teaching.

On the course homepage, Antonio Professor Torralba mentions other courses taught at other universities by the new breed of Professors such as Computer Vision by Rob Fergus and Internet Vision by Tamara Berg. Note: I have only listed Antonio's, Rob's, and Tamara's courses since they are Fall2008 offerings -- many other courses exist but are from Fall07 or other semesters.

Thanks to my advisor, Alyosha Efros, for pointing out this new course.

On another note, I'm back at CMU from a Google summer internship where I was working with Thomas Leung on computer vision related problems.

Sunday, August 10, 2008

What is segmentation? What is image segmentation?

According to Merriam-Webster, segmentation is "the process of dividing into segments" and a segment is "a separate piece of something; a bit, or a fragment." This is a rather broad definition which suggests that segmentation is nothing mystical -- it is just taking a whole and partitioning it into pieces. One can segment sentences, time periods, tasks, inhabitants of a country, and digital images.

Segmentation is a term that often pops up in technical fields, such as Computer Vision. I have attempted to write a short article on Knol about Image Segmentation and how it pertains to Computer Vision. Deciding to abstain from discussing specific algorithms -- which might be of interest to graduate students and not the population as a whole -- I instead target the high-level question, "Why segment images?" The answer, according to me, is that image segmentation (and any other image processing task) should be performed solely assist object recognition and image understanding.

Wednesday, July 30, 2008

The Duality of Perception and Action

If we have managed to build an unambiguous 3D model of the viewed scene, is the process then complete? The answer to this question must be no. Computer vision systems, per se, are only part of a more embracing system which is simultaneously concerned with making sense of the environment and interacting with the environment. Without action, perception is futile, without perception, action is futile. Both are complementary, but strongly related activities and any intelligent action in which the system engages in the environment, i.e., anything it does, it does with an understanding of its action, and it gains this quite often by on going visual perception. Computer vision, then, is not an end in itself, that is, while the task of constructing an unambiguous explicit 3D representation of the world is a large part of its function, there is more to vision than just the explication of structural organisation. In essence, computer vision systems, or image understanding systems, are as concerned with cause and effect with purpose, with action and reaction as they are with structural organisation.

This is a rather remarkable excerpt from "Advanced Image Understanding and Autonomous Systems," by David Vernon from Department of Computer Science in Trinity College Dublin, Ireland.

Thursday, July 24, 2008

More Newton's Method Fractals on Youtube

newtons method fractal image
I've posted another cool fractal video. I used ffmpeg and imagemagick to get the screenshots from an OpenGL C++ program running on my Macbook Pro.

Friday, July 11, 2008

Learning Per-Exemplar Distance Functions == Learning Anisotropic Per-Exemplar Kernels for Non-Parametric Density Estimation

When estimating probability densities, one can be parametric or non-parametric. A parametric approach goes like this: a.) assume the density has a certain form, then b.) estimate parameters of that distribution. A Gaussian is a simple example. In the parametric case, having more data means that we will get "better" estimates of the parameters. Unfortunately, the parameters will only be "better" if the modeling distribution is close to "the truth."

Non-parametric approach make very weak assumptions about the underlying distribution -- however the number of parameters is a non-parametric density estimator scales with the number of data points. Generally estimation proceeds as follows: a.) store all the data, b.) use something like a Parzen density estimate where a tiny Gaussian kernel is placed around each data point.

The theoretical strength of the non-parametric approach is that *in theory* we can approximate any underlying distribution. In reality, we would need a crazy amount of data to approximate any underlying distribution -- the pragmatic answer to "why be non-parametric?" is that there is no real learning going on. In a non-parametric approach, we don't a parameter estimation stage. While estimating the parameters of a single gaussian is quite easy, consider a simple gaussian mixture model -- the parameter estimation (generally done via EM) is not so simple.

With the availability of inexpensive memory, data-driven approaches have become popular in computer vision. Does this mean that by using more data we can simply bypass parameter estimation?

An alternative is to combine the best of both worlds. Use lots of data, make very few assumptions about the underlying density, and still allow learning to improve density estimates. Usually a single isotropic kernel is used in non-parametric density estimation -- if we really have no knowledge this might be the only thing possible. But maybe something better than an isotropic kernel can be used, and perhaps we can use learning to find out the shape of the kernel.

The local per-exemplar distance function learning algorithm that I presented in my CVPR 2008 paper, can be thought of as such a anisotropic per-data-point kernel estimator.

I put some MATLAB distance function learning code up on the project web page. Check it out!