Wednesday, August 24, 2011

The vision hacker culture at Google ...

I sometimes get frustrated when developing machine learning algorithms in C++.  And since working in object recognition basically means you have to be a machine learning expert, trying something new and exciting in C++ can be extremely painful.  I don't miss the C++ heavy workflow for vision projects at Google.  C++ is great for building large-scale systems, but not for pioneering object recognition representations.  I like to play with pixels and I like to think of everything as matrices.  But programming languages, software engineering philosophies, and other coding issues aren't going to be today's topic.  Today I want to talk about the one thing that is more valuable that is computers, and that is people.  Not just people, but a community of people, and in particular the culture at Google -- in particular, vision@Google.

I miss being around the hacker culture at Google.  

The people at Google aren't just hackers, they are Jedis when it comes to building great stuff -- and that is why I recommend a Google internship to many of my fellow CMU vision Robograds (fyi, Robograds are CMU Robotics Graduate Students).  CMU-ers, like Googlers, like to build stuff.  However, CMU-ers are typically younger.

What is a software engineering Jedi, you might ask? Tis' one who is not afraid of million cores, one who is not afraid of building something great.  While little boys get hurt by the guns 'n knives of C++, Jedi use their tools like ninjas use their swords. You go into Google as a boy, you come out a man.  NOTE: I do not recommend going to Google and just toying around in Matlab for 3 months.  Build something great, find a Yoda-esque mentor, or at least strive to be a Jedi.  There's plenty of time in grad school for Matlab and writing papers.  If you get a chance to go to Google, take the opportunity to go large-scale and learn to MapReduce like the pros.

Every day I learn about more and more people I respect in vision and learning going to Google, or at least interning there (e.g., Andrej Karpathy who is starting his PhD@Stanford and Santosh Divvala who is a well-known CMU PhD student and vision hacker).  And I really can't blame them for choosing Google over places like Microsoft for the summer.  I can't think of many better places to be -- the culture is inimitable.  I spent two summers at Jay Yagnik's group some of the great people I interned with are already full-time Googlers (e.g. Luca Bertelli and Mehmet Emre Sargin).  And what is really great about vision@google is that these guys get to publish surprisingly often!  Not just throw-away-code kind of publish, but stuff that fits inside large-scale systems -- stuff which is already inside Google products.  The technology is often inside the Google product before the paper goes public!  Of course it's not easy to publish at a place like Google because there is just way too much exciting large-scale stuff going on.  Here is a short list of some cool 2010/2011 vision papers (from vision conferences) with significant Googler contributions.

Kernelized Structural SVM Learning

“Kernelized Structural SVM Learning for Supervised Object Segmentation”, Luca BertelliTianli Yu, Diem Vu, Burak Gokturk, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition 2011.
[abstract] [pdf]

Finding Meaning on YouTube

“Finding Meaning on YouTube: Tag Recommendation and Category Discovery”, George Toderici,Hrishikesh AradhyeMarius Pasca, Luciano Sbaiz, Jay YagnikComputer Vision and Pattern Recognition, 2010.
[abstract] [pdf]

Here is a very exciting and new paper from SIGGRAPH 2011.  It is a sort of Visual Memex for faces -- congratulations on this paper, guys!  Check out the video below.

Exploring Photobios Movie

Exploring Photobios from Ira Kemelmacher on Vimeo

Ira Kemelmacher-Shlizerman, Eli Shechtman, Rahul Garg, Steven M. Seitz. "Exploring Photobios." ACM Transactions on Graphics (SIGGRAPH), Aug 2011. [pdf]

Finally, here is a very mathematical paper with a sexy title from the vision@google team.  It will be presented at the upcoming ICCV 2011 Conference in Barcelona -- the same conference where I'll be presenting my Exemplar-SVM paper.  

The Power of Comparative Reasoning
Jay Yagnik, Dennis Strelow, David Ross, Ruei-Sung Lin. ICCV 2011. [PDF]

P.S. If you're a fellow vision blogger, then come find me in Barcelona@iccv2011 -- we'll go brag a beer.

Tuesday, August 16, 2011

Question: What makes an object recognition system great?

Today, instead of discussing my own perspectives on object recognition or sharing some useful links, I would like to ask a general question geared towards anybody working in the field of computer vision:

What makes an object recognition system great?

In particular, I would like to hear a broad range of perspectives regarding what is necessary to provide an impact-creating open-source object recognition system for the research community to use.  As a graduate student you might be interested in building your own recognition system, as a researcher you might be interested in extending or comparing against a current system, and as an educator you might want to to direct your students to a fully-functional object recognition system which could be used to bootstrap their research.

To start the discussion I would like to first enumerate a few elements which I find important in making an object recognition system great.

Open Source
In order for object recognition to progress, I think releasing binary executables is simply not enough.  Allowing others to see your source code means that you gain more scientific credibility and you let others extend your system -- this means letting others both train and test variants of your system. More people using an object recognition system also translates to a high citation count, which is favorable for researchers seeking career advancement.  Felzenszwalb et al. have released multiple open-source version of their Discriminatively Trained Deformable Part Model -- each time we see a new release it gets better!  Such continual development means that we know the authors really care about this problem.  I feel Github, with its distributed version control and social-coding features, is a powerful took the community should adopt, something which I believe is very much needed to take the community's ideas to the next level.  In my own research (e.g., the Ensemble of Exemplar-SVMs approach), I have started using Github (for both private and public development) and I love it. Linux might have been started by a single individual, but it took a community to make it great.  Just look at where Linux is now.

Ease of use
For ease of use, it is important that the system is implemented in a popular language which is known by a large fraction of the vision community.  Matlab, Python, C++, and Java are such popular language and many good implementations are a combination of Matlab with some highly-optimized routines in C++.  Good documentation is also important since one cannot expect only experts to be using such a system.

Strong research results
The YaRS approach, which is the "yet-another-recognition-system" approach, doesn't translate to high usage unless the system actually performs well on a well-accepted object recognition task.  Every year at vision conferences, many new recognition frameworks are introduced, but really only a few of them ever pass the test of time.  Usually an ideas withstands time because it is a conceptual contribution to science, but systems such as the HOG-based pedestrian detector of Dalal-Triggs and the Latent Deformable Part Model of Felzenszwalb et al. are actually being used by many other researchers.  The ideas in these works are not only good, but the recognition systems are great.

So what would you like to see in the next generation of object recognition systems?  I will try my best to reply to any comments posted below.  Any really great comment might even trigger a significant discussion; enough to warrant its own blog post.  Anybody is welcome to comment/argue/speculate below, either using their real name or anonymously.

Monday, August 15, 2011

CMU's Black Fridays: a graduating PhD student's perspective

I've always been amazed that the CMU department really knows what its PhD students are up to -- the big stuff as well as the little stuff.

Allow me to elaborate.  As a PhD student at CMU, you receive a sort of "report card" at the end of every semester during on what is known as "Black Friday." You first submit a short summary of your accomplishments and goals for next semester.  Then, on this special day, the professors talk to each other about their students (probably in some secret sound-proof discussion room).  While faculty are discussing our fates, we, the students do the opposite.  We relax, watch movies, play games, and *imagine* what our superiors are discussing.  Black Friday letters serve two roles, a role for the faculty, and a role for the students.

Image courtesy of diypapers

For the faculty, Black Friday is a way for the department to evaluate and monitor the progress of their PhD students.  Faculty members get a chance to discuss their students' hardships as well as their successes.  For the students, these letters are way to keep us in *check*.

"You better check yo self before you wreck yo self" - Ice Cube

The Black Friday letter lets us know explicitly what research qualifiers, writing qualifiers, etc they expect us to complete next semester.  They let us know if they are happy with our progress or unhappy.

Photo by Mattox

In practice, I found the letters to be a combination of "the good" and "the bad."  The good (a statement such as "we are happy that you got your paper accepted to ___") is like getting A's in grade school -- a yipee! moment.  The bad (a statement such as "we have noticed that you are struggling taking your experimental research to the next level, and feel that you are spending too much time on ___"), is what pushes us to experience those yipee! moments the following semester.  In the past, my Black Friday letters have included details regarding elements of my research and coursework that I didn't know any faculty member even knew about.  But the faculty care!  The students are the future, and there is nothing like critical feedback to help us achieve our goals.  There have been several times during my 6 years at CMU's Robotics Institute when my letter helped keep me in check.

I only wish there was a way for students to provide Black Friday letters to their faculty mentors...

Further reading:
Black Friday from Peter Lee
How do you evaluate your grad students? by Matt Welsh

Saturday, August 13, 2011

blazing fast nms.m (from exemplar-svm library)

If you care about building large-scale object recognition systems, you have to care about speed.  And every little bit of performance counts -- so why not first optimize the stuff which is slowing you down?

NMS (non-maximum suppression) is a very popular post-processing method for eliminating redundant object detection windows.  I have take Felzenszwalb et al.'s nms.m and made it significantly faster by eliminating an inner loop.  6 years of grad school, 6 years of building large-scale vision systems in Matlab, and you really learn how to vectorize code.  The code I call millions of times needs to be fast, and nms is one of those routines I call all the time.

The code is found below as a Github gist -- which was taken from my Exemplar-SVM object recognition library (from my ICCV2011 paper: Ensemble of Exemplar-SVMs for Object Detection and Beyond).  The same file, nms.m, can also be found a part of the Exemplar-SVM library on Github.  In fact, this code produces the same result as Pedro's code, but is much faster.  Here is one timing experiment I performed when performing nms on ~300K windows, where my version is roughly 100 times faster.  When you deal with exemplar-SVMs you have to deal with lots of detectors (i.e., lots of detection windows), so fast NMS is money.

>> tic;top = nms_original(bbs,.5);toc
Elapsed time is 58.172313 seconds.

>> tic;top = nms_fast(bbs,.5);toc
Elapsed time is 0.532638 seconds.

Friday, August 12, 2011

Ensemble of Exemplar-SVMs for Object Detection and Beyond

Over the next couple of days I will be announcing some very exciting news.  As many of you know, I defended my PhD this past Monday at CMU.  My family and friends came for the presentation as I defended 6 years of my life in front of Alyosha Efros, Martial Hebert, Takeo Kanade, and Pietro Perona.  You might be wondering what I've been up this past year -- what sort of new vision research have I produced since the Visual Memex paper.

Throughout the last year or so I have slowly abandoned the segment-then-recognize approach and fully embraced the exemplar-based component of my research.  Because once you go exemplar, you don't go back!  If only Nosofsky was here, he would be proud.  Once you have established a good exemplar-detection alignment, problems such as segmentation become trivial.  In fact, exemplar association enables a host of meta-data transfer applications.  Here is a quick overview of my recent ICCV 2011 paper with Alexei Efros and Abhinav Gupta (the super new and exciting professor at CMU who will likely revolutionize they way we, vision researchers, think about the interplay of geometric reasoning and object recognition).  

I will be defending my work to the ICCV crowd this fall in Barcelona.  Here is the paper.

Tomasz Malisiewicz, Abhinav Gupta, Alexei A. Efros. Ensemble of Exemplar-SVMs for Object Detection and Beyond . In ICCV, 2011. [PDF] [Project Page]

This paper proposes a conceptually simple but surprisingly powerful method which combines the effectiveness of a discriminative object detector with the explicit correspondence offered by a nearest-neighbor approach. 

Exemplar Associations go Beyond Bounding Boxes

The method is based on training a separate linear SVM classifier for every exemplar in the training set. Each of these Exemplar-SVMs is thus defined by a single positive instance and millions of negatives. 
An ensemble of exemplars

While each detector is quite specific to its exemplar, we empirically observe that an ensemble of such Exemplar-SVMs offers surprisingly good generalization. Our performance on the PASCAL VOC detection task is on par with the much more complex latent part-based model of Felzenszwalb et al., at only a modest computational cost increase. 

Generalization from a single positive instance

But the central benefit of our approach is that it creates an explicit association between each detection and a single training exemplar. Because most detections show good alignment to their associated exemplar, it is possible to transfer any available exemplar meta-data (segmentation, geometric structure, 3D model, etc.) directly onto the detections, which can then be used as part of overall scene understanding.

This paper can be rightfully seen as a marriage of my older work on learning per-exemplar distances with the discriminative training method of Felzenszwalb et al.

Here are some summary pictures from my paper and a short description of each one:

1. Going beyond object detection (i.e., produce a category-labeled bounding box), we look at several meta-data transfer applications.  Meta-data transfer is a way interpreting an object detection in a way which transcends category membership.  The first task is that of geometry transfer.

Geometry Transfer

2. Segmentation is a well-known problem in computer vision -- generally tackled with bottom-up approaches which strive to produce coherent regions based on pixel-pixel appearance similarity.  We show that a recognize-then-segment is possible, and in particular an associate-then-segment approach based on transferring segmentations from exemplars onto detection windows. 

Segmentation Transfer

3. Object exemplar often show an interplay of objects, suggesting that it is possible to use the recognition of one object to prime the presence of another. 
Related Object Priming

P.S. Dr. Abhinav Gupta is looking for students, so if you are a 1st year CMU visionary (CMU visionary = robotics vision student@CMU), check out his presentation during the RI Immigration Course.

P.S.S. Anonymous Reviewer#3: Not only have you single-handedly saved my paper from the clutches of ICCV death, but you have resurrected a graduate student's faith in the justice of the vision peer review process.