Tuesday, November 04, 2008

Linear Support Vector Machine (SVM) in the Primal

I often solve linear SVMs. These are convex optimization problems, much like logistic regression, where the goal is to find a linear decision boundary between two classes. Unlike logistic regression, only the points near the decision boundary affect its parameters. Data points that are correctly classified by a margin do not influence the decision boundary.

Quite often when SVMs are taught in a Machine Learning course, the dual and kernelization are jumped into very quickly. While the formulation looks nice and elegant, I'm not sure how many students can come home after such a lecture and implement an SVM in a language such as MATLAB on their own.

I have to admit that I never really obtained a good grasp of Support Vector Machines until I sat through through John Lafferty's lectures in 10-702: Statistical Machine Learning which demistyfied them. The main idea was that an SVM is just like logistic regression but with a different loss function -- the hinge loss function.

A very good article written on SVMs, and how they can be efficiently tackled in the primal (which is super-easy to understand) is Olivier Chapelle's Training a Support Vector Machine in the Primal. Chapelle is a big proponent of primal optimization and he has some arguments on why primal approximations are better than dual approximations. On this webpage one can also find MATLAB code for SVMs which is very fast in practice since it uses a second order optimization technique. In order to use such a second order technique, the squared hinge loss is used (see the article for why). I have used this code many times in the past even for linear binary classification problems with over 6 million data points embedded in 7 dimensions.

In fact, this is the code that I have used in the past for distance function learning. So the next time you want to throw something simple and fast at a binary classification problem, check out Chapelle's code.


  1. Your post is above me! I think in pictures not numbers. I am looking for someone interested in quantum mechanics you look like the right person. I have a theory on spacetime. what I need is someone to look at it who knows what they are looking at. I hope you don't mind me dropping in.


  2. I find it odd that this fundamental point is so deeply obscured in all the major textbooks. It's only when I dug into stochastic gradient descent optimizers that I realized this stuff's all the same. If you check out Leon Bottou's SGD pages and talks, this is pretty clear.

    Once you realize this, you can start monkeying with the loss in theory, though it's tough when you get outside of convex loss functions. Here's an overview of Martin Jansche's approach using F-measure loss:
    F-measure Loss for "Logistic Regression"

    It also took a while for it to dawn on me that lasso and ridge regression, L1 and L2 regularization, and Laplace and Gaussian priors were three sets of names for the same two things. And that max entropy was just logistic regression (how you see it in comp ling), as are one-level neural nets with softmax activation (how McKay's book refers to it). I read Hastie et al. without ever putting the pieces together. I summarized in a blog entry and our javadoc:
    Logistic Regression by Any Other Name.

  3. sarah2:17 AM

    i stumbled across your blog in desperation of finding journals on SVm and MATLAB for doing my project
    the problem is, i never did found any articles or tutorials that could easily make me understand and grasp the whole process of SVM, let alone applying it in my problem.
    i hope you could help me out a bit since i think you have done the experiment yourself.
    Of course, i am having problems in adapting the problem with mathlab as well.

    In your opinion,How can i use SVM and Matlab to solve an age group classification problem?
    i know it is not impossible to solve that problem, yet i have not found anything that could help me understand in implementing SVM in MATLAB , apart from the SVMtrain function in the biometrics tool in MATLAB,(which i cannot understand enough for me to finish my project)

    i am terribly sorry if this is in any way troubling you.
    but i really hope you could help me with this.

    by the way, im sarah,
    and im doing a degree in artificial intelligence.

  4. Hi Sarah,

    I would recommend taking a graduate level Machine Learning course if you really want to understand SVMs. While this will give you the tools to understand linear classification techniques such as logistic regression and SVMs, you will probably want to use a popular (and highly optimized package) for you real scientific experiments.

    You don't really have to be able to understand all the details of the optimization process involved in using kernelized SVMs to successfully use them in your own work.

    There is nothing like taking a grad level Machine Learning course. Hopefully it will be intense and involve lots of difficult problem sets which require extensive programming. That's probably the best way to learn, IMHO.

  5. very interesting!