Tuesday, October 25, 2011

NIPS 2011 preview: person grammars and machines-in-the-loop for video annotation

Object Detection with Grammar Models
To appear in NIPS 2011 pdf

Today, I want to point out two upcoming NIPS papers which might be of interest to the Computer Vision community.  First, we have a person detection paper from the hackers who brought you Latent Discriminatively Trained Part-based Models (aka voc-release-3.1 and voc-release-4.0).  I personally don't care for grammars (I think exemplars are a much more data-driven and computation-friendly way of modeling visual concepts), but I think any paper with Pedro on the author list is really worth checking out.  Maybe after I digest all the details, I'll jump on the grammar bandwagon (but I doubt it).  Also of note, is the fact that Pedro Felzenszwalb has relocated to Brown University.

The second paper, is by Carl Vondrick and Deva Ramanan (also of latent-svm fame).  Carl is the author of vatic and a fellow vision@github hacker.  Carl, like myself, has joined Antonio Torralba's group at MIT this fall.  He just started his PhD, so you can only expect the quality of his work to increase without bound over the next ~5 years.  vatic is an online, interactive video annotation tool for computer vision research that crowdsources work to Amazon's Mechanical Turk. Vatic makes it easy to build massive, affordable video data sets and can be deployed on a cloud. Written in Python + C + Javascript, vatic is free and open-source software. The video below showcases the power of vatic.

In this paper, Vondrick et al. use active learning to select the frames which require human annotation.  Rather than simply doing linear interpolation between frames, they are truly putting the "machine-in-the-loop." When doing large-scale video annotation, this approach can supposedly save you tens of thousands of dollars.

Carl Vondrick and Deva Ramanan. "Video Annotation and Tracking with Active LearningNeural Information Processing Systems (NIPS) Granada, Spain, December 2011. [paper] [slides]


  1. Anonymous12:27 AM

    Thanks for the tips, Tomasz.
    I must admit though, after trying out vatic, I can't really see how this is different from simple linear interpolation between frames.

  2. Vatic is Carl's older project, something which you might know him from. His new paper uses active learning as a tool to minimize the number of frames labeled by manual labor. This is an improvement over vatic's linear interpolation, which has the user label every few frames and fills in the rest. I will make this clearer in the post.