Object Detection with Grammar Models
To appear in NIPS 2011 pdf
Today, I want to point out two upcoming NIPS papers which might be of interest to the Computer Vision community. First, we have a person detection paper from the hackers who brought you Latent Discriminatively Trained Part-based Models (aka voc-release-3.1 and voc-release-4.0). I personally don't care for grammars (I think exemplars are a much more data-driven and computation-friendly way of modeling visual concepts), but I think any paper with Pedro on the author list is really worth checking out. Maybe after I digest all the details, I'll jump on the grammar bandwagon (but I doubt it). Also of note, is the fact that Pedro Felzenszwalb has relocated to Brown University.
In this paper, Vondrick et al. use active learning to select the frames which require human annotation. Rather than simply doing linear interpolation between frames, they are truly putting the "machine-in-the-loop." When doing large-scale video annotation, this approach can supposedly save you tens of thousands of dollars.
Carl Vondrick and Deva Ramanan. "Video Annotation and Tracking with Active Learning" Neural Information Processing Systems (NIPS) Granada, Spain, December 2011. [paper] [slides]
Thanks for the tips, Tomasz.ReplyDelete
I must admit though, after trying out vatic, I can't really see how this is different from simple linear interpolation between frames.
Vatic is Carl's older project, something which you might know him from. His new paper uses active learning as a tool to minimize the number of frames labeled by manual labor. This is an improvement over vatic's linear interpolation, which has the user label every few frames and fills in the rest. I will make this clearer in the post.ReplyDelete