Throughout the last year or so I have slowly abandoned the segment-then-recognize approach and fully embraced the exemplar-based component of my research. Because once you go exemplar, you don't go back! If only Nosofsky was here, he would be proud. Once you have established a good exemplar-detection alignment, problems such as segmentation become trivial. In fact, exemplar association enables a host of meta-data transfer applications. Here is a quick overview of my recent ICCV 2011 paper with Alexei Efros and Abhinav Gupta (the super new and exciting professor at CMU who will likely revolutionize they way we, vision researchers, think about the interplay of geometric reasoning and object recognition).
I will be defending my work to the ICCV crowd this fall in Barcelona. Here is the paper.
Tomasz Malisiewicz, Abhinav Gupta, Alexei A. Efros. Ensemble of Exemplar-SVMs for Object Detection and Beyond . In ICCV, 2011. [PDF] [Project Page]
This paper proposes a conceptually simple but surprisingly powerful method which combines the effectiveness of a discriminative object detector with the explicit correspondence offered by a nearest-neighbor approach.
Exemplar Associations go Beyond Bounding Boxes
The method is based on training a separate linear SVM classifier for every exemplar in the training set. Each of these Exemplar-SVMs is thus defined by a single positive instance and millions of negatives.
An ensemble of exemplars
While each detector is quite specific to its exemplar, we empirically observe that an ensemble of such Exemplar-SVMs offers surprisingly good generalization. Our performance on the PASCAL VOC detection task is on par with the much more complex latent part-based model of Felzenszwalb et al., at only a modest computational cost increase.
Generalization from a single positive instance
But the central benefit of our approach is that it creates an explicit association between each detection and a single training exemplar. Because most detections show good alignment to their associated exemplar, it is possible to transfer any available exemplar meta-data (segmentation, geometric structure, 3D model, etc.) directly onto the detections, which can then be used as part of overall scene understanding.
This paper can be rightfully seen as a marriage of my older work on learning per-exemplar distances with the discriminative training method of Felzenszwalb et al.
Here are some summary pictures from my paper and a short description of each one:
1. Going beyond object detection (i.e., produce a category-labeled bounding box), we look at several meta-data transfer applications. Meta-data transfer is a way interpreting an object detection in a way which transcends category membership. The first task is that of geometry transfer.
2. Segmentation is a well-known problem in computer vision -- generally tackled with bottom-up approaches which strive to produce coherent regions based on pixel-pixel appearance similarity. We show that a recognize-then-segment is possible, and in particular an associate-then-segment approach based on transferring segmentations from exemplars onto detection windows.
3. Object exemplar often show an interplay of objects, suggesting that it is possible to use the recognition of one object to prime the presence of another.
Related Object Priming
P.S. Dr. Abhinav Gupta is looking for students, so if you are a 1st year CMU visionary (CMU visionary = robotics vision student@CMU), check out his presentation during the RI Immigration Course.
P.S.S. Anonymous Reviewer#3: Not only have you single-handedly saved my paper from the clutches of ICCV death, but you have resurrected a graduate student's faith in the justice of the vision peer review process.