Finding Things: Image Parsing with Regions and Per-Exemplar Detectors. Joseph Tighe and Svetlana Lazebnik, CVPR 2013
Idea #1: "Segmentation-driven" Image Parsing
The idea of using bottom-up segmentation to parse scenes is not new. Superpixels (very small segments which are likely to contain a single object category) coupled with some machine learning can be used to produce a coherent scene parsing system; however, the boundaries of objects are not as precise as one would expect. This shortcoming stems from the smoothing terms used in random field inference and because generic category-level classifiers have a hard time reasoning about the extent of an object. To see how superpixel-based scene parsing works, check out the video from their older paper from ECCV2010:
Idea #2: Per-exemplar segmentation mask transfer
For me, the most exciting thing about this paper is the integration of the segmentation mask transfer from exemplar-based detections. The ideas is quite simple: each detector is exemplar-specific and is thus equipped with its own (precise) segmentation mask. When you produce detections from such exemplar-based systems, you can immediately transfer segmentations in a purely top-down manner. This is what I have been trying to get people excited about for years! Congratulations to Joseph Tighe for incorporating these ideas into a full-blow image interpretation system. To see an example of mask transfer, check out the figure below.
Their system produces a per-pixel labeling of the input image, and as you can see below, the results are quite good. Here are some more outputs of their system as compared to solely region-based as well as solely detector-based systems. Using per-exemplar detectors clearly complements superpixel-based "segmentation-driven" approaches.
This paper will be presented as an oral in the Orals 3C session called "Context and Scenes" to be held on Thursday, June 27th at CVPR 2013 in Portland, Oregon.
Post a Comment