Saturday, August 13, 2011

blazing fast nms.m (from exemplar-svm library)

If you care about building large-scale object recognition systems, you have to care about speed.  And every little bit of performance counts -- so why not first optimize the stuff which is slowing you down?

NMS (non-maximum suppression) is a very popular post-processing method for eliminating redundant object detection windows.  I have take Felzenszwalb et al.'s nms.m and made it significantly faster by eliminating an inner loop.  6 years of grad school, 6 years of building large-scale vision systems in Matlab, and you really learn how to vectorize code.  The code I call millions of times needs to be fast, and nms is one of those routines I call all the time.

The code is found below as a Github gist -- which was taken from my Exemplar-SVM object recognition library (from my ICCV2011 paper: Ensemble of Exemplar-SVMs for Object Detection and Beyond).  The same file, nms.m, can also be found a part of the Exemplar-SVM library on Github.  In fact, this code produces the same result as Pedro's code, but is much faster.  Here is one timing experiment I performed when performing nms on ~300K windows, where my version is roughly 100 times faster.  When you deal with exemplar-SVMs you have to deal with lots of detectors (i.e., lots of detection windows), so fast NMS is money.

>> tic;top = nms_original(bbs,.5);toc
Elapsed time is 58.172313 seconds.

>> tic;top = nms_fast(bbs,.5);toc
Elapsed time is 0.532638 seconds.


  1. I decided to post this, because I have some of my fellow CMU vision hacker buddies using this. They found it helpful, so why shouldn't you?

  2. Try AccelerEyes Jacket for even more speed.

  3. I'm pretty sure that both accelereyes jacket and any other GPU library will help out the sliding window part of detection the most. Thanks for the tip!

  4. I have a CPP(MEX) version of the same algorithm, i will try to upload it as Well.

  5. Awesome! If you make a github gist out of your cpp/mex version, hopefully somebody will run some timing tests.

    Looking forward to seeing your version.

  6. Hello,

    Thanks for your great matlab codes, I've made a C version here

    I just tested some simple cases, ;)

  7. Could you please note the format of input and output?

  8. Hi Sai,
    The input is a collection of bounding boxes represented as a large matrix. Each row is an N-d vector with the first four numbers the bbox coordinates and the last number the score. The middle numbers can be just about anything else you want.

    The output is a new matrix of bbs, such that they don't overlap more than the overlap threshold.