Wednesday, August 24, 2011

The vision hacker culture at Google ...

I sometimes get frustrated when developing machine learning algorithms in C++.  And since working in object recognition basically means you have to be a machine learning expert, trying something new and exciting in C++ can be extremely painful.  I don't miss the C++ heavy workflow for vision projects at Google.  C++ is great for building large-scale systems, but not for pioneering object recognition representations.  I like to play with pixels and I like to think of everything as matrices.  But programming languages, software engineering philosophies, and other coding issues aren't going to be today's topic.  Today I want to talk about the one thing that is more valuable that is computers, and that is people.  Not just people, but a community of people, and in particular the culture at Google -- in particular, vision@Google.

I miss being around the hacker culture at Google.  

The people at Google aren't just hackers, they are Jedis when it comes to building great stuff -- and that is why I recommend a Google internship to many of my fellow CMU vision Robograds (fyi, Robograds are CMU Robotics Graduate Students).  CMU-ers, like Googlers, like to build stuff.  However, CMU-ers are typically younger.

What is a software engineering Jedi, you might ask? Tis' one who is not afraid of million cores, one who is not afraid of building something great.  While little boys get hurt by the guns 'n knives of C++, Jedi use their tools like ninjas use their swords. You go into Google as a boy, you come out a man.  NOTE: I do not recommend going to Google and just toying around in Matlab for 3 months.  Build something great, find a Yoda-esque mentor, or at least strive to be a Jedi.  There's plenty of time in grad school for Matlab and writing papers.  If you get a chance to go to Google, take the opportunity to go large-scale and learn to MapReduce like the pros.

Every day I learn about more and more people I respect in vision and learning going to Google, or at least interning there (e.g., Andrej Karpathy who is starting his PhD@Stanford and Santosh Divvala who is a well-known CMU PhD student and vision hacker).  And I really can't blame them for choosing Google over places like Microsoft for the summer.  I can't think of many better places to be -- the culture is inimitable.  I spent two summers at Jay Yagnik's group some of the great people I interned with are already full-time Googlers (e.g. Luca Bertelli and Mehmet Emre Sargin).  And what is really great about vision@google is that these guys get to publish surprisingly often!  Not just throw-away-code kind of publish, but stuff that fits inside large-scale systems -- stuff which is already inside Google products.  The technology is often inside the Google product before the paper goes public!  Of course it's not easy to publish at a place like Google because there is just way too much exciting large-scale stuff going on.  Here is a short list of some cool 2010/2011 vision papers (from vision conferences) with significant Googler contributions.

Kernelized Structural SVM Learning

“Kernelized Structural SVM Learning for Supervised Object Segmentation”, Luca BertelliTianli Yu, Diem Vu, Burak Gokturk, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition 2011.
[abstract] [pdf]

Finding Meaning on YouTube

“Finding Meaning on YouTube: Tag Recommendation and Category Discovery”, George Toderici,Hrishikesh AradhyeMarius Pasca, Luciano Sbaiz, Jay YagnikComputer Vision and Pattern Recognition, 2010.
[abstract] [pdf]

Here is a very exciting and new paper from SIGGRAPH 2011.  It is a sort of Visual Memex for faces -- congratulations on this paper, guys!  Check out the video below.

Exploring Photobios Movie

Exploring Photobios from Ira Kemelmacher on Vimeo

Ira Kemelmacher-Shlizerman, Eli Shechtman, Rahul Garg, Steven M. Seitz. "Exploring Photobios." ACM Transactions on Graphics (SIGGRAPH), Aug 2011. [pdf]

Finally, here is a very mathematical paper with a sexy title from the vision@google team.  It will be presented at the upcoming ICCV 2011 Conference in Barcelona -- the same conference where I'll be presenting my Exemplar-SVM paper.  

The Power of Comparative Reasoning
Jay Yagnik, Dennis Strelow, David Ross, Ruei-Sung Lin. ICCV 2011. [PDF]

P.S. If you're a fellow vision blogger, then come find me in Barcelona@iccv2011 -- we'll go brag a beer.


  1. +1 on the post, having interned at Google twice in Steve Seitz's group. Here are some more Google papers from recent CVPR:

    (Youtube video stabilization is particularly impressive)

  2. Thanks Rahul!

    The face constraint for video stabilization is really cool! I know Matthias Grundmann because he interned at Google during the Summer of 2008 (when I was there).

    If anybody knows any more super cool vision@google papers, let me know.

  3. I not a googler, but I do have something that may solve your c++ woes.

    I read your last post (Exemplar-SVM) and am now a subscriber.

    I've been working with some people on a new open source cross platform vision library:

    We are aiming to help get rid of some of the c++, IDE, config issues, and just let people start working on vision applications.

    If you would like to publish your code in addition to papers I encourage you to join in with us and help make it an awesome open source framework for vision.

  4. Hey X,
    I really like the idea behind your simplecv project, I've forked it on github and will take a look at it once I settle down in my new academic institution. This week has been my moving week and I won't get a chance to put my coding-hat back on until next week. I'm planning a post about vision tools/technologies and I might mention your project.

    I'm definitely excited about the opportunity to collaborate on vision frameworks because throughout my PhD I've written way too many frameworks as part of different recognition projects. The experience has been great, but I don't want the future generation of students to spend too much time doing the framework engineering (its not easy!). Giving the next generation a simple and great framework will enable them to produce cool stuff much faster.

  5. interesting article. I have to write and test my code in C++ (OpenCV) unlike others (I deal with vision problems on embedded platforms like beagle board). It takes longer to debug the programs though.