Object Detection with Grammar Models
To appear in NIPS 2011 pdf
Today, I want to point out two upcoming NIPS papers which might be of interest to the Computer Vision community. First, we have a person detection paper from the hackers who brought you Latent Discriminatively Trained Part-based Models (aka voc-release-3.1 and voc-release-4.0). I personally don't care for grammars (I think exemplars are a much more data-driven and computation-friendly way of modeling visual concepts), but I think any paper with Pedro on the author list is really worth checking out. Maybe after I digest all the details, I'll jump on the grammar bandwagon (but I doubt it). Also of note, is the fact that Pedro Felzenszwalb has relocated to Brown University.
In this paper, Vondrick et al. use active learning to select the frames which require human annotation. Rather than simply doing linear interpolation between frames, they are truly putting the "machine-in-the-loop." When doing large-scale video annotation, this approach can supposedly save you tens of thousands of dollars.