Tuesday, November 04, 2008

Linear Support Vector Machine (SVM) in the Primal

I often solve linear SVMs. These are convex optimization problems, much like logistic regression, where the goal is to find a linear decision boundary between two classes. Unlike logistic regression, only the points near the decision boundary affect its parameters. Data points that are correctly classified by a margin do not influence the decision boundary.

Quite often when SVMs are taught in a Machine Learning course, the dual and kernelization are jumped into very quickly. While the formulation looks nice and elegant, I'm not sure how many students can come home after such a lecture and implement an SVM in a language such as MATLAB on their own.

I have to admit that I never really obtained a good grasp of Support Vector Machines until I sat through through John Lafferty's lectures in 10-702: Statistical Machine Learning which demistyfied them. The main idea was that an SVM is just like logistic regression but with a different loss function -- the hinge loss function.

A very good article written on SVMs, and how they can be efficiently tackled in the primal (which is super-easy to understand) is Olivier Chapelle's Training a Support Vector Machine in the Primal. Chapelle is a big proponent of primal optimization and he has some arguments on why primal approximations are better than dual approximations. On this webpage one can also find MATLAB code for SVMs which is very fast in practice since it uses a second order optimization technique. In order to use such a second order technique, the squared hinge loss is used (see the article for why). I have used this code many times in the past even for linear binary classification problems with over 6 million data points embedded in 7 dimensions.

In fact, this is the code that I have used in the past for distance function learning. So the next time you want to throw something simple and fast at a binary classification problem, check out Chapelle's code.