Sunday, April 21, 2013

International Conference of Computational Photography 2013 (ICCP 2013) Day 1 recap

Yesterday was the first day of ICCP 2013.  While the conference should have started Friday, it was postponed until Saturday due to the craziness in Boston.  Nevertheless, it was an excellent day of mingling with colleagues and listening to talks/posters.  Here are some noteworthy items from Saturday:

Marc Levoy (from Stanford and Google) gave a keynote about Google Glass and how it will change the game of photography and video collection.  Marc was one of three Googlers wearing Glass.  The other two were Sam Hasinoff (former MIT CSAILer) and Peyman Milanfar (from UCSC/Google).  I had the privilege of chatting with Prof. Milanfar during Saturday night's reception at the Harvard Faculty club and got to share my personal views on what Glass means for Robotics researchers like myself.

Marc Levoy at ICCP 2013

During his presentation, Matthias Grundman from Georgia Tech talked about his work on radiometric self-calibration of videos and the implications of his work for visual object recognition from YouTube videos is fairly evident.  In other words, why have your machine learning algorithm deal with a source of appearance variations due to the imaging process if it can be removed!
Matthias Grundman at ICCP 2013

Post-processing Approach for Radiometric Self-Calibration of Video. Matthias Grundmann (Georgia Tech), Chris McClanahan (Georgia Tech), Sing Bing Kang (Microsoft Research), Irfan Essa (Georgia Tech). ICCP 2013

Hany Farid from Dartmouth University presented an excellent keynote on Image Forensics.  Image manipulators beware!  His work is not going to make image forgery impossible, but it will take it out of the hands of amateurs.
Hany Farid at ICCP 2013

The best paper award was given to the following paper:
"3Deflicker from Motion" by Yohay Swirski (Technion), Yoav Schechner (Technion)

Good job Yohay and Yoav!

Finally, we (the MIT object detection hackers) will be setting up our own wearable computing platform, the HOGgles box, for the Demo session during lunch.  Carl Vondrick, Aditya Khosla, and I will also be there during the coffee breaks after lunch with the HOGgles demo.

Today should be as much as yesterday and I will try to upload some videos of HOGgles in action later tonight.

Friday, April 19, 2013

Can you pass the HOGgles test? Inverting and Visualizing Features for Object Detection

Despite more than a decade of incessant research by some of the world's top computer vision researchers, we still ask ourselves "Why is object detection such a difficult problem?"

Surely, better features, better learning algorithms, and better models of visual object categories will result in improved object detection performance.  But instead of waiting an indefinite time until the research world produces another Navneet Dalal (of HOG fame) or Pedro Felzenszwalb (of DPM fame), we (the vision researchers in Antonio Torralba's lab at MIT) felt the time was ripe to investigate object detection failures from an entirely new perspective.

When we (the researchers) look at images, the problem of object detection appears trivial; however, object detection algorithms don't typically analyze raw pixels, they analyze images in feature spaces!  The Histogram of Oriented Gradients feature (commonly known as HOG) is the de-facto standard in object detection these days.  While looking at gradient distributions might make sense for machines, we felt that these features were incomprehensible to the (human) researchers who have to make sense of object detection failures.  Here is a motivating quote from Marcel Proust (a French novelist), which most accurately describes what we did:

The real voyage of discovery consists not in seeking new landscapes but in having new eyes.” -- Marcel Proust

In short, we built new eyes.  These new "eyes" are a method for converting machine readable features into human readable RGB images.  We take a statistical machine learning approach to visualization -- we learn how to invert HOG using ideas from sparse coding and large-scale dictionary learning.  Let me briefly introduce the concept of HOGgles (i.e., HOG glasses).

Taken from Carl Vondrick's project abstract:
We present several methods to visualize the HOG feature space, a common descriptor for object detection. The tools in this paper allow humans to put on "HOG glasses" and see the visual world as a computer might see it.

Here is an example of a short video (movie trailer for Terminator) which shows the manually engineered HOG visualization (commonly know as the HOG glyph), the original image, and our learned iHOG visualization.

We are presenting a real-time demo of this new and exciting line of work at the 2013 International Conference of Computational Photography (ICCP2013) which is being held at Harvard University this weekend (4/19/2013 - 4/21/2013).  If you want to try our sexy wearable platform and become a real-life object detector for a few minutes, then come check us out at this Sunday morning's demo session at ICCP2013 at Harvard University.

Also, if you thought TorralbaArt was cool, you must check out VondrickArt (a result of trying to predict color using the iHOG visualization framework)

Project-related Links:

Project website:
Project code (MATLAB-based) on Github:
arXiv paper:

Authors' webpages:

Carl Vondrick (MIT PhD student):
Aditya Khosla (MIT PhD student):
Tomasz Malisiewicz (MIT Postdoctoral Fellow):
Antonio Torralba (MIT Professor):

We hope that with these new eyes, we (the vision community) will better understand the failures and successes of machine vision systems.  I, for one, welcome our new HOGgles wearing overlords.

UPDATE: We will be present the paper at ICCV 2013.  An MIT News article covering the research can be found here: