Tombone's Computer Vision Blog: March 2010

Monday, March 22, 2010

PhDs make many smart programmers become software engineering n00bs

This is true. A couple of years in a PhD program -- reading papers and writing throw-away code in Matlab, and it easy to become a throw-away programmer, a sort of liability in the real world. It is no surprise many companies look down on hiring PhDs. I've seen kids enter the PhD program with real programming talent and exit real software engineering n00bs. In graduate school, you might code for 6 years without anybody grading your code. If you get sloppy, you will be worse off than when you started.

The problem is that many advisors don't care about their students writing good code. Writing good papers and giving good presentations -- you will be told that this is what makes good PhD students. Who cares about writing good code? -- we'll just have some 'engineering' people re-write it once you become famous. This is what students across the globe are being fed. This is no surprise, because your advisor won't get tenure by turning you into a mean mathematically-inclined super hacker. Then again, your advisor won't care if you go bald, are malnutritioned, and have no life outside research. There are many things that one has to take care of themselves, and software development skills aren't any different.

Note to the real world looking to hire talent: You should grill, I mean really grill fresh PhDs regarding the software development skills. Don't become mesmerized by their 4.0s, their long publication lists, and all their 'achievements.' If you want to hire a fresh PhD to write code, whether in a research or an engineering setting, then give them one hell-of-an-interview. I agree with Google's interview process. I studied for it, I am proud of my own software engineering skills, and I was proud to have been an intern at Google (twice). But I know of companies who were sorry they hired PhDs only to learn these recent graduates could only dabble on the board and would utterly fail at the terminal.

Note to PhDs looking to one day take our skill-set and impact the real world: Never stop learning and never stop writing good code. Never stop taking care of yourself. You were the brightest of the brightest before you started your PhD, and now you have 5-6 years to exit as a real superman. With all the mathematics and presentations skills you will acquire during a PhD ,on top of good software engineering skills, you will become invaluable to the real world. Its a real shame to become less valuable to the outside world after 6 years of a strenuous PhD program. But nobody will give you the recipe for success. Nobody will tell you to exercise, but if you want to pound your brain with mental challenges for decades to come, you will need physical exercise in your daily regiment. Your advisors won't tell you that keeping up to date on the tools of the trade, and being a real hacker, is very valuable in the real world. You will be told that fast results = many papers and its not worth writing good code.

After obtaining a PhD we should be role-models for the entire world. Seriously, why not? If a PhD is the highest degree that an institution can grant, then we should feel proud about getting one. But we are human, and one is only as strong as their weakest link. We should become super hackers, fear no quantum mechanics, fear no presentation in front of a crowd, and be all that one can be.

This is a part of a serious of posts aimed at finding flaws in the academic/PhD process and how it pertains to building strong/intelligent/confident individuals.

Thursday, March 18, 2010

Back to basics: Vision Science Transcends Mathematics

Vision (a.k.a. image understanding, image interpretation, perception, object recognition) is quite unlike some of the mathematical problems we were introduced to in our youth. In fact, thinking of vision as a "mathematical problem" in the traditional sense is questionable. An important characteristic of such "problems" is that by pointing them out we already have a notion of what it would be like to solve them. Does a child think of gravity as such a problem? Well, probably not, because without the necessary mathematical backbone there is no problem with gravity! It's just the way the world works! But once a child has been perverted by mathematics and introduced into the intellectual world of science, the world ceases to just be. The world become a massive equation.

Consider the seemingly elementary problem of finding the roots of a cubic polynomial. Many of us can recite the quadratic equation by heart, but not the one for cubics (try deriving the simpler quadratic formula by hand). If we were given one evening and a whole lot of blank paper, we could try tackling this problem (no Google allowed!). While the probability of failure is quite high (and arguably most of us would fail), it would still make sense of "coming closer to the solution". Maybe we could even solve the problem when some terms are missing, etc. The important thing here is that the notion of having reached a solution is well-defined. Also, once we've found the solution it would probably be easier to convince ourselves that it is correct (verification would be easier than actually coming up with the solution).

Vision is more like theoretical physics, psychology, and philosophy and less like the well-defined math problem I described above. When dealing with the math problem described above, we know what the symbols mean, we know valid operations -- the game is already set in place. In vision, just like physics, psychology and philosophy, the notion of a fundamental operational unit (which happens to be an object for vision) isn't rigidly defined as the Platonic Ideals used throughout mathematics. We know what a circle is, we know what a real-valued variable is, but what is a "car"? Consider your mental image of a car. Now remove a wheel and ask yourself, is this still a car? Surely! But what happens as we start removing more and more elements. At what point does this object cease to be a car and become a motor, a single tire, or a piece of metal? The circle, a Platonic Ideal, ceases to become a circle once it has suffered the most trivial of all perturbations -- any deviation from perfection, and boom! the circle ceases to be a circle.

Much of Computer Vision does not ask such metaphysical questions, as objects of the real world are seamlessly mapped to abstract symbols that our mathematically-inclined PhD students love to play with. I am sad to report that this naive mapping between objects of the real world and mathematical symbols isn't so much a questions of style, it is basically the foundation of modern computer vision research. So what must be done to expand this parochial field of Vision into a mature field? Wake up and stop coding! I think Vision needs a sort of a mental coup d'état, a fresh outlook on old problem. Sometimes to make progress we have start with a clean slate -- current visionaries do not possess the right tools for this challenging enterprise. Instead of throwing higher-level mathematics at the problem, maybe we are barking up the wrong tree? However, if mathematics is the only thing we are good at, then how are we to have a mature discussion which transcends mathematics? The window through which we peer circumscribes the world we see.

I believe if we are to make progress in this challenging endeavor, we must first become Renaissance men, a sort of Neitzschean Übermensch. We must understand what has been said about perception, space, time, and the structure of the universe. We must become better historians. We must study not only more mathematics, but more physics, more psychology, read more Aristotle and Kant, build better robots, engineer stabler software, become better sculptors and painters, become more articulate orators, establish better personal relationships, etc. Once we've mastered more domains of reality, and only then, will we have a better set of tools for coping with paradoxes inherent in artificial intelligence. Because a better grasp on reality -- inching closer to enlightenment -- will result in asking more meaningful questions.

I am optimistic. But the enterprise which I've outlined will require a new type of individual, one worthy of the name Renaissance Man. We aren't interested in toy problems here, nor cute solutions. If we want to make progress, we must shape our lives and outlooks around this very fact. Two steps backwards and three steps forward. Rinse, lather, repeat.

Friday, March 05, 2010

Representation and Use of Knowledge in Vision: Barrow and Tenenbaum's Conclusion

To gain a better perspective on my research regarding the Visual Memex, I spent some time reading Object Categorization: Computer and Human Vision Perspectives which contains many lovely essays on Computer Vision. This book contains recently written essays by titans of Computer Vision and contains a great deal lessons learned from history. While such a 'looking back' on vision makes for a good read, it is also worthwhile to find old works 'looking forward' and anticipating the successes and failures of the upcoming generations.

In this 'looking forward' fashion, I want to share a passage regarding image understanding systems, from "Representation and Use of Knowledge in Vision," by H. G. Barrow and J. M. Tenenbaum, July 1975. This is a short paper worth reading for both graduate students and professors interested in pushing Computer Vision research to its limits. I enjoyed the succinct and motivational ending so much, it is worth repeating it verbatim:

--------

III Conclusion

We conclude by reiterating some of the major premises underlying this paper:

The more knowledge the better.
The more data, the better.
Vision is a gigantic optimization problem.
Segmentation is low-level interpretation using general knowledge.
Knowledge is incrementally acquired.
Research should pursue Truth, not Efficiency.

A further decade will determine our skill as visionaries.

-------------