Today, I wanted to point everyone's attention to a super-cool paper from day 1 of this year's ICCV 2011 Conference. Megha Pandey is the lead on this, and Lana Lazebnik (of spatial pyramid fame) is the seasoned vision community member supervising this research. The idea is really simple (and simplicity is a plus!): train a latent deformable part-based model for scenes. Some of the scene models look really cool, and I encourage everybody interested in scene recognition to take a look.
A Part-based Scene Model
One of the reasons why I like this paper is because just like our SIGGRAPH ASIA 2011 paper on cross-domain image matching, they are using HOG features to represent scenes and applying these models in a sliding-window fashion. This is much different than the traditional image-to-feature-vector mapping used in systems based on the GIST descriptor. These types of approaches allow the detection of a scene inside another image! Framing issues are elegantly handled by allowing the model to slide.
Scene Recognition and Weakly Supervised Object Localization with Deformable Part-Based Models. Megha Pandey and Svetlana Lazebnik. Proceedings of the IEEE International Conference on Computer Vision, 2011. Project Page [pdf]
Abstract: Weakly supervised discovery of common visual structure in highly variable, cluttered images is a key problem in recognition. We address this problem using deformable part-based models (DPM’s) with latent SVM training. These models have been introduced for fully supervised training of object detectors, but we demonstrate that they are also capable of more open-ended learning of latent structure for such tasks as scene recognition and weakly supervised object localization. For scene recognition, DPM’s can capture recurring visual elements and salient objects; in combination with standard global image features, they obtain state-of-the-art results on the MIT 67-category indoor scene dataset. For weakly supervised object localization, optimization over latent DPM parameters can discover the spatial extent of objects in cluttered training images without ground-truth bounding boxes. The resulting method outperforms a recent state-of-the-art weakly supervised object localization approach on the PASCAL-07 dataset.
Weakly Supervised Object Localization (see paper for details)