If you care about building large-scale object recognition systems, you have to care about speed. And every little bit of performance counts -- so why not first optimize the stuff which is slowing you down?
NMS (non-maximum suppression) is a very popular post-processing method for eliminating redundant object detection windows. I have take Felzenszwalb et al.'s nms.m and made it significantly faster by eliminating an inner loop. 6 years of grad school, 6 years of building large-scale vision systems in Matlab, and you really learn how to vectorize code. The code I call millions of times needs to be fast, and nms is one of those routines I call all the time.
The code is found below as a Github gist -- which was taken from my Exemplar-SVM object recognition library (from my ICCV2011 paper: Ensemble of Exemplar-SVMs for Object Detection and Beyond). The same file, nms.m, can also be found a part of the Exemplar-SVM library on Github. In fact, this code produces the same result as Pedro's code, but is much faster. Here is one timing experiment I performed when performing nms on ~300K windows, where my version is roughly 100 times faster. When you deal with exemplar-SVMs you have to deal with lots of detectors (i.e., lots of detection windows), so fast NMS is money.
>> tic;top = nms_original(bbs,.5);toc
Elapsed time is 58.172313 seconds.
>> tic;top = nms_fast(bbs,.5);toc
Elapsed time is 0.532638 seconds.