Wednesday, June 26, 2013

[Awesome@CVPR2013] Scene-SIRFs, Sketch Tokens, Detecting 100,000 object classes, and more

I promised to blog about some more exciting papers at CVPR 2013, so here is a short list of a few papers which stood out.  This list also include this year's award winning paper: Fast, Accurate Detection of 100,000 Object Classes on a Single Machine.  Congrats Google Research on the excellent paper!



This paper uses ideas from Abhinav Gupta's work on 3D scene understanding as well as Ali Farhadi's work on visual phrases; however, it also uses RGB-D input data (like many other CVPR 2013 papers).

W. Choi, Y. -W. Chao, C. Pantofaru, S. Savarese. "Understanding Indoor Scenes Using 3D Geometric Phrases" in CVPR, 2013. [pdf]

This paper shows a uses the crowd to learn which parts of birds are useful for fine-grained categorization.  If you work on fine-grained categorization or run experiments with MTurk, then you gotta check this out!
Fine-Grained Crowdsourcing for Fine-Grained Recognition. Jia Deng, Jonathan Krause, Li Fei-Fei. CVPR, 2013. [ pdf ]

This paper won the best paper award.  Congrats Google Research!

Fast, Accurate Detection of 100,000 Object Classes on a Single Machine. Thomas Dean, Mark Ruzon, Mark Segal, Jon Shlens, Sudheendra Vijayanarasimhan, Jay Yagnik. CVPR, 2013 [pdf]


The following is the Scene-SIRFs paper, which I thought was one of the best papers at this year's CVPR.  The ideas to to decompose an input image into intrinsic images using Barron's algorithm which was initially shown to work on objects, but now is being applied to realistic scenes.

Intrinsic Scene Properties from a Single RGB-D Image. Jonathan T. Barron, Jitendra Malik. CVPR, 2013 [pdf]


This is a graph-based localization paper which uses a sort of "Visual Memex" to solve the problem.
Graph-Based Discriminative Learning for Location Recognition. Song Cao, Noah Snavely. CVPR, 2013. [pdf]


This paper provides an exciting new way of localizing contours in images which is orders of magnitude faster than the gPb.  There is code available, so the impact is likely to be high.

Sketch Tokens: A Learned Mid-level Representation for Contour and Object Detection. Joseph J. Lim, C. Lawrence Zitnick, and Piotr Dollar. CVPR 2013. [ pdf ] [code@github]

Friday, June 21, 2013

[Awesome@CVPR2013] Image Parsing with Regions and Per-Exemplar Detectors

I've been making an inventory of all the awesome papers at this year's CVPR 2013 conference, and one which clearly stood out was Tighe & Lazebnik's paper titled:


This paper combines ideas from segmentation-based "scene parsing" (see the below video for the output of their older ECCV2010 SuperParsing system) as well as per-exemplar detectors (see my Exemplar-SVM paper, as well as my older Recognition by Association paper).  I have worked and published in these two separate lines of research, so when I tell you that this paper is worthy of reading, you should at least take a look.  Below I outline the two ideas which are being synthesized in this paper, but for all details you should read their paper (PDF link).  See the overview figure below:


Idea #1: "Segmentation-driven" Image Parsing
The idea of using bottom-up segmentation to parse scenes is not new.  Superpixels (very small segments which are likely to contain a single object category) coupled with some machine learning can be used to produce a coherent scene parsing system; however, the boundaries of objects are not as precise as one would expect.  This shortcoming stems from the smoothing terms used in random field inference and because generic category-level classifiers have a hard time reasoning about the extent of an object.  To see how superpixel-based scene parsing works, check out the video from their older paper from ECCV2010:


Idea #2: Per-exemplar segmentation mask transfer
For me, the most exciting thing about this paper is the integration of the segmentation mask transfer from exemplar-based detections.  The ideas is quite simple: each detector is exemplar-specific and is thus equipped with its own (precise) segmentation mask.  When you produce detections from such exemplar-based systems, you can immediately transfer segmentations in a purely top-down manner.  This is what I have been trying to get people excited about for years!  Congratulations to Joseph Tighe for incorporating these ideas into a full-blow image interpretation system.  To see an example of mask transfer, check out the figure below.


Their system produces a per-pixel labeling of the input image, and as you can see below, the results are quite good.  Here are some more outputs of their system as compared to solely region-based as well as solely detector-based systems.  Using per-exemplar detectors clearly complements superpixel-based "segmentation-driven" approaches.



This paper will be presented as an oral in the Orals 3C session called "Context and Scenes" to be held on Thursday, June 27th at CVPR 2013 in Portland, Oregon.

Tuesday, June 18, 2013

Must-see Workshops @ CVPR 2013

June is that wonderful month during which computer vision researchers, students, and entrepreneurs go to CVPR -- the premier yearly Computer Vision conference.  Whether you are presenting a paper, learning about computer vision, networking with academic colleagues, looking for rock-star vision experts to join your start-up, or looking for rock-star vision start-ups to join, CVPR is where all of the action happens!  If you're not planning on going, it is not too late! The Conference starts next week in Portland, Oregon.


There are lots of cool papers at CVPR, many which I have already studied in great detail, and many others which I will learn about next week.  I will write about some of the cool papers/ideas I encounter while I'm at CVPR next week.  In addition to the main conference, CVPR has 3 action-packed workshop days.  I want to take this time to mention two super-cool workshops which are worth checking out during CVPR 2013.  Workshop talks are generally better than the main conference talks, since the invited speakers tend to be more senior and they get to present a broader view of their research (compared to the content of a single 8-page research paper as is typically discussed during the main conference).

SUNw: Scene Understanding Workshop
Sunday June 23, 2013


From the webpage: Scene understanding started with the goal of building machines that can see like humans to infer general principles and current situations from imagery, but it has become much broader than that. Applications such as image search engines, autonomous driving, computational photography, vision for graphics, human machine interaction, were unanticipated and other applications keep arising as scene understanding technology develops. As a core problem of high level computer vision, while it has enjoyed some great success in the past 50 years, a lot more is required to reach a complete understanding of visual scenes.

I attended some the other SUN workshops which were held at MIT during the winter months.  This time around, the conference is at CVPR, so by definition it will be accessible to more researchers.  Even though I have the pleasure of knowing personally the super-smart workshop organizers (Jianxiong Xiao, Aditya Khosla, James Hays, and Derek Hoiem), the most exciting tidbit about this workshop is the all-star invited speaker schedule.  The speakers include: Ali Farhadi, Yann LeCun, Fei-Fei Li, Aude Oliva, Deva Ramanan, Silvio Savarese, Song-Chun Zhu, and Larry Zitnick.  To hear some great talks and hear about truly bleeding-edge research by some of vision's most talented researchers, come to SUNw.

VIEW 2013: Vision Industry and Entrepreneur Workshop
Monday, June 24, 2013



From the webpage: Once largely an academic discipline, computer vision today is also a commercial force. Startups and global corporations are building businesses based on computer vision technology. These businesses provide computer vision based solutions for the needs of consumers, enterprises in many commercial sectors, non-profits, and governments. The demand for computer vision based solutions is also driving commercial and open-source development in associated areas, including hardware and software platforms for embedded and cloud deployments, new camera designs, new sensing technologies, and compelling applications. Last year, we introduced the IEEE Vision Industry and Entrepreneur Workshop (VIEW) at the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) to bridge the academic and commercial worlds of computer vision. 

I include this workshop in the must-see list because the time is right for Compter Vision researchers to start innovating at start-ups.  First of all, the world wants your vision-based creations today.  With the availability of smart phones and widespread broadband access, the world does not want to wait a decade until the full academic research pipeline gets translated into products.  Seeing such workshops at CVPR is exciting, because this will help breed a new generation of researcher/entrepreneur.  I, for one, welcome our new company-starting computer vision overlords.