Tombone's Computer Vision Blog: research papers

Friday, June 21, 2013

[Awesome@CVPR2013] Image Parsing with Regions and Per-Exemplar Detectors

I've been making an inventory of all the awesome papers at this year's CVPR 2013 conference, and one which clearly stood out was Tighe & Lazebnik's paper titled:

Finding Things: Image Parsing with Regions and Per-Exemplar Detectors. Joseph Tighe and Svetlana Lazebnik, CVPR 2013

This paper combines ideas from segmentation-based "scene parsing" (see the below video for the output of their older ECCV2010 SuperParsing system) as well as per-exemplar detectors (see my Exemplar-SVM paper, as well as my older Recognition by Association paper). I have worked and published in these two separate lines of research, so when I tell you that this paper is worthy of reading, you should at least take a look. Below I outline the two ideas which are being synthesized in this paper, but for all details you should read their paper (PDF link). See the overview figure below:

Idea #1: "Segmentation-driven" Image Parsing
The idea of using bottom-up segmentation to parse scenes is not new. Superpixels (very small segments which are likely to contain a single object category) coupled with some machine learning can be used to produce a coherent scene parsing system; however, the boundaries of objects are not as precise as one would expect. This shortcoming stems from the smoothing terms used in random field inference and because generic category-level classifiers have a hard time reasoning about the extent of an object. To see how superpixel-based scene parsing works, check out the video from their older paper from ECCV2010:

ECCV2010 SuperParsing

Idea #2: Per-exemplar segmentation mask transfer
For me, the most exciting thing about this paper is the integration of the segmentation mask transfer from exemplar-based detections. The ideas is quite simple: each detector is exemplar-specific and is thus equipped with its own (precise) segmentation mask. When you produce detections from such exemplar-based systems, you can immediately transfer segmentations in a purely top-down manner. This is what I have been trying to get people excited about for years! Congratulations to Joseph Tighe for incorporating these ideas into a full-blow image interpretation system. To see an example of mask transfer, check out the figure below.

Their system produces a per-pixel labeling of the input image, and as you can see below, the results are quite good. Here are some more outputs of their system as compared to solely region-based as well as solely detector-based systems. Using per-exemplar detectors clearly complements superpixel-based "segmentation-driven" approaches.

This paper will be presented as an oral in the Orals 3C session called "Context and Scenes" to be held on Thursday, June 27th at CVPR 2013 in Portland, Oregon.

Tuesday, June 19, 2012

CVPR 2012 Day 1: Accidental Cameras, Large Jigsaws, and Cosegmentation

Today ended the first day of CVPR 2012 in Providence, RI. And here's a quick recap:

On the administrative end of things, Deva Ramanan received an award for his contributions to the field as a new young CVPR researcher. This is a new nomination-based award so be sure to vote for your favorite vision scientists next year! Deva's work has truly influenced the field and he is well-known for being a co-author of the Felzenszwalb et al. DPM object detector, but since then he has pushed his ideas on part-based models to the next level. Congratulations Deva, you are the type of researcher we should all strive to be.
Secondly, it looks like CVPR 2015 will be in Boston.
Here are some noteworthy papers from the oral sessions of Day 1:

During the first oral session, Antonio Torralba gave an intriguing talk where he showed the world how accidental anti-pinhole and pin-speck cameras are "all around us." In his presentation, he showed how a person walking in front of a window can be used to image the world outside of a window. Additionally he showed a variant of image-based Van-Eck phreaking, where his technique could be used to view what is on a person's computer screen without having to look at the screen directly.

Accidental pinhole and pinspeck cameras: revealing the scene outside the picture
Antonio Torralba and William T. Freeman
CVPR 2012

Andrew Gallagher gave a really great presentation on using computer vision to solve jigsaw puzzles, where not only are the pieces jumbled, but their orientation is unknown. His algorithm was used to solve really really large puzzles, ones which are much larger than could be tackled by a human.

Jigsaw Puzzles with Pieces of Unknown Orientation
Andrew Gallagher
CVPR 2012

Gunhee Kim presented his newest work on co-segmentation. He has been working on this for quite some time and if you are interested in segmentation in image collections, you should definitely check it out.

On Multiple Foreground Cosegmentation
Gunhee Kim and Eric P. Xing
CVPR 2012

Friday, June 21, 2013

[Awesome@CVPR2013] Image Parsing with Regions and Per-Exemplar Detectors

Tuesday, June 19, 2012

CVPR 2012 Day 1: Accidental Cameras, Large Jigsaws, and Cosegmentation

Subscribe To