Friday, April 19, 2013

Can you pass the HOGgles test? Inverting and Visualizing Features for Object Detection

Despite more than a decade of incessant research by some of the world's top computer vision researchers, we still ask ourselves "Why is object detection such a difficult problem?"

Surely, better features, better learning algorithms, and better models of visual object categories will result in improved object detection performance.  But instead of waiting an indefinite time until the research world produces another Navneet Dalal (of HOG fame) or Pedro Felzenszwalb (of DPM fame), we (the vision researchers in Antonio Torralba's lab at MIT) felt the time was ripe to investigate object detection failures from an entirely new perspective.

When we (the researchers) look at images, the problem of object detection appears trivial; however, object detection algorithms don't typically analyze raw pixels, they analyze images in feature spaces!  The Histogram of Oriented Gradients feature (commonly known as HOG) is the de-facto standard in object detection these days.  While looking at gradient distributions might make sense for machines, we felt that these features were incomprehensible to the (human) researchers who have to make sense of object detection failures.  Here is a motivating quote from Marcel Proust (a French novelist), which most accurately describes what we did:

The real voyage of discovery consists not in seeking new landscapes but in having new eyes.” -- Marcel Proust



In short, we built new eyes.  These new "eyes" are a method for converting machine readable features into human readable RGB images.  We take a statistical machine learning approach to visualization -- we learn how to invert HOG using ideas from sparse coding and large-scale dictionary learning.  Let me briefly introduce the concept of HOGgles (i.e., HOG glasses).

Taken from Carl Vondrick's project abstract:
We present several methods to visualize the HOG feature space, a common descriptor for object detection. The tools in this paper allow humans to put on "HOG glasses" and see the visual world as a computer might see it.

Here is an example of a short video (movie trailer for Terminator) which shows the manually engineered HOG visualization (commonly know as the HOG glyph), the original image, and our learned iHOG visualization.


We are presenting a real-time demo of this new and exciting line of work at the 2013 International Conference of Computational Photography (ICCP2013) which is being held at Harvard University this weekend (4/19/2013 - 4/21/2013).  If you want to try our sexy wearable platform and become a real-life object detector for a few minutes, then come check us out at this Sunday morning's demo session at ICCP2013 at Harvard University.



Also, if you thought TorralbaArt was cool, you must check out VondrickArt (a result of trying to predict color using the iHOG visualization framework)

Project-related Links:

Project website: http://mit.edu/vondrick/ihog/
Project code (MATLAB-based) on Github: https://github.com/CSAILVision/ihog
arXiv paper: http://mit.edu/vondrick/ihog/techreport.pdf

Authors' webpages:

Carl Vondrick (MIT PhD student): http://mit.edu/vondrick/
Aditya Khosla (MIT PhD student): http://people.csail.mit.edu/khosla/
Tomasz Malisiewicz (MIT Postdoctoral Fellow): http://people.csail.mit.edu/tomasz/
Antonio Torralba (MIT Professor): http://web.mit.edu/torralba/www/

We hope that with these new eyes, we (the vision community) will better understand the failures and successes of machine vision systems.  I, for one, welcome our new HOGgles wearing overlords.

UPDATE: We will be present the paper at ICCV 2013.  An MIT News article covering the research can be found here:
http://web.mit.edu/newsoffice/2013/teaching-computers-to-see-by-learning-to-see-like-computers-0919.html