Friday, August 12, 2011

Ensemble of Exemplar-SVMs for Object Detection and Beyond

Over the next couple of days I will be announcing some very exciting news.  As many of you know, I defended my PhD this past Monday at CMU.  My family and friends came for the presentation as I defended 6 years of my life in front of Alyosha Efros, Martial Hebert, Takeo Kanade, and Pietro Perona.  You might be wondering what I've been up this past year -- what sort of new vision research have I produced since the Visual Memex paper.

Throughout the last year or so I have slowly abandoned the segment-then-recognize approach and fully embraced the exemplar-based component of my research.  Because once you go exemplar, you don't go back!  If only Nosofsky was here, he would be proud.  Once you have established a good exemplar-detection alignment, problems such as segmentation become trivial.  In fact, exemplar association enables a host of meta-data transfer applications.  Here is a quick overview of my recent ICCV 2011 paper with Alexei Efros and Abhinav Gupta (the super new and exciting professor at CMU who will likely revolutionize they way we, vision researchers, think about the interplay of geometric reasoning and object recognition).  

I will be defending my work to the ICCV crowd this fall in Barcelona.  Here is the paper.

Paper:
Tomasz Malisiewicz, Abhinav Gupta, Alexei A. Efros. Ensemble of Exemplar-SVMs for Object Detection and Beyond . In ICCV, 2011. [PDF] [Project Page]

Abstract:
This paper proposes a conceptually simple but surprisingly powerful method which combines the effectiveness of a discriminative object detector with the explicit correspondence offered by a nearest-neighbor approach. 

Exemplar Associations go Beyond Bounding Boxes

The method is based on training a separate linear SVM classifier for every exemplar in the training set. Each of these Exemplar-SVMs is thus defined by a single positive instance and millions of negatives. 
An ensemble of exemplars

While each detector is quite specific to its exemplar, we empirically observe that an ensemble of such Exemplar-SVMs offers surprisingly good generalization. Our performance on the PASCAL VOC detection task is on par with the much more complex latent part-based model of Felzenszwalb et al., at only a modest computational cost increase. 

Generalization from a single positive instance


But the central benefit of our approach is that it creates an explicit association between each detection and a single training exemplar. Because most detections show good alignment to their associated exemplar, it is possible to transfer any available exemplar meta-data (segmentation, geometric structure, 3D model, etc.) directly onto the detections, which can then be used as part of overall scene understanding.

This paper can be rightfully seen as a marriage of my older work on learning per-exemplar distances with the discriminative training method of Felzenszwalb et al.

Here are some summary pictures from my paper and a short description of each one:

1. Going beyond object detection (i.e., produce a category-labeled bounding box), we look at several meta-data transfer applications.  Meta-data transfer is a way interpreting an object detection in a way which transcends category membership.  The first task is that of geometry transfer.


Geometry Transfer

2. Segmentation is a well-known problem in computer vision -- generally tackled with bottom-up approaches which strive to produce coherent regions based on pixel-pixel appearance similarity.  We show that a recognize-then-segment is possible, and in particular an associate-then-segment approach based on transferring segmentations from exemplars onto detection windows. 

Segmentation Transfer


3. Object exemplar often show an interplay of objects, suggesting that it is possible to use the recognition of one object to prime the presence of another. 
Related Object Priming

P.S. Dr. Abhinav Gupta is looking for students, so if you are a 1st year CMU visionary (CMU visionary = robotics vision student@CMU), check out his presentation during the RI Immigration Course.

P.S.S. Anonymous Reviewer#3: Not only have you single-handedly saved my paper from the clutches of ICCV death, but you have resurrected a graduate student's faith in the justice of the vision peer review process.

10 comments:

  1. Way to go Dr. T!
    Looking forward to where this goes next.

    ReplyDelete
  2. Thanks Abhi, I am really excited about the conceptual simplicity of this method and I am confident that only great things will result!

    ReplyDelete
  3. Anonymous4:22 PM

    Can I know the review ratings of the paper?

    ReplyDelete
  4. Excellent paper and work. If this paper was rejected, it would be a pity :)

    ReplyDelete
  5. Nice work with clear explanation! Would you recommend to use this approach for pose invariant face detection?

    ReplyDelete
  6. Anonymous2:52 PM

    Nice work with clear explanation! Would you recommend to use this approach for pose invariant face detection?

    ReplyDelete
    Replies
    1. exemplar-SVMs definitely work for face detection, but the space of faces isn't as large as the space of appearances for other less rigid objects.

      For faces, take a look at http://luchaochao.me/papers/GaussianFace.pdf

      Delete
  7. Thanks for your quick reply. I looked at Gaussian face, but they haven't mentioned how they have detected faces. Do you know what is their face detection algorithm is?

    ReplyDelete
  8. Hi Sara, I don't know the details of their detection method, but you can take a look at the following 2012 paper on face detection in the wild from Deva Ramanan's group:

    http://www.ics.uci.edu/~xzhu/face/

    ReplyDelete