Tombone's Computer Vision Blog: Takeo Kanade

Showing posts with label Takeo Kanade. Show all posts

Sunday, July 24, 2011

CMU Robotics Instititue's vision finds a home at Google

Congratulations to PittPatt for their recent acquisition by Google. PittPatt, a Pittsburgh-based startup, has its roots in CMU's Robotics Institute (where I'm currently a PhD student). Henry Schneiderman, the CEO of PittPatt, did some truly hardcore computer vision work while doing his PhD under Takeo Kanade.

Two famous papers by Hendry Schneiderman and Takeo Kanade are the following:
H. Schneiderman, T. Kanade. "A Statistical Method for 3D Object Detection Applied to Faces and Cars". IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2000) pdf format

H. Schneiderman, T. Kanade. "Probabilistic Modeling of Local Appearance and Spatial Relationships for Object Recognition." IEEE Conference on Computer Vision and Pattern Recognition (CVPR 1998). pdf format

Here is what the front page of PittPatt states:

Joining Google is the next thrilling step in a journey that began with research at Carnegie Mellon University's Robotics Institute in the 1990s and continued with the launching of Pittsburgh Pattern Recognition (PittPatt) in 2004. We've worked hard to advance the research and technology in many important ways and have seen our technology come to life in some very interesting products. At Google, computer vision technology is already at the core of many existing products (such as Image Search, YouTube, Picasa, and Goggles), so it's a natural fit to join Google and bring the benefits of our research and technology to a wider audience. We will continue to tap the potential of computer vision in applications that range from simple photo organization to complex video and mobile applications.

We look forward to joining the team at Google!

The team at Pittsburgh Pattern Recognition

Perhaps Henry's success is yet another reason to come to CMU to get a vision PhD...

Friday, February 19, 2010

Data-Driven Image Parsing With the Visual Memex: Thesis Proposal Complete!

Yesterday, I successfully gave my thesis proposal talk at CMU and it was a great experience. The feedback I obtained from my committee members was invaluable, especially the comments from Takeo Kanade. It was a great honor for me to have Takeo Kanade, one of the titans of vision, on my committee. My external member, Pietro Perona, is also a key player object recognition, and provided some perceptive comments.

I gave my talk on my Macbook Pro using Keynote. I use the dvi output to connect to the projector and on my screen I was able to see the current slide as well as the upcoming slide. Using Skype I was able to connect to Pietro in California and share the presentation screen (not my two-slide screen!) with him. This was, he was able to follow along and see the same slides as everybody in the room. Skype was a great success!

I would like to thank everybody who came to my talk!

Wednesday, January 20, 2010

Heterarchies and Control Structure in Image Interpretation

Several days ago I was reading one of Takeo Kanade's classic computer vision papers from 1977 titled "Model Representation and Control Structure in Image Understanding" and I came across a new term, heterarchy. I think motivating this concept is as important as its definition. At the representational level, Kanade does a good job at advocating the use of multiple levels of representation -- from pixels to patches to regions to subimages to objects.

In addition to discussing the representational aspects of image understanding systems, Kanade analyzes different strategies for using knowledge in such systems (he uses the term control structure to signify the overall flow of information between subroutines). On one extreme is pass-oriented processing (this is Kanade's term -- I prefer to use the terms feed-forward or bottom-up) which relies on iteratively building higher levels of interpretation from lower ones. Marr's vision pipeline is mostly bottom-up, but that discussion will be left for another post. Another extreme is top-down processing, where the image is analyzed in a global-to-local fashion. Of course, as of 2010 these ideas are being used on a regular basis in vision. One example is the paper Learning to Combine Bottom-Up and Top-Down Segmentation by Levin and Weiss.

Kanade acknowledges that the flow of a vision algorithm is very much dependent on the representation used. For image understanding, bottom-up as well as top-down processing will both be critical components of the entire system. However the exact strategy for combining these processes, in addition to countless other mid-level stages, is not very clear. Directly quoting Kanade, "The ultimate style would be a heterarchy, in which a number of modules work together like a community of experts with no strict central executive control." According to this line of thought, processing would occur in a loopy and cooperative style. Kanade attributes the concept of a heterarchy to Patrick Winston who worked with robots in the golden days of AI at MIT. Like Kanade, Winston criticizes a linear flow of information in scene interpretation (this criticism dates back to 1971). The basic problem outlined by both Kanade and Winston is that modules such as line-finders and region-finders (think segmentation) are simply not good enough to be used in subsequent stages of understanding. In my own research I have used the concept of multiple image segmentations to bypass some of the issued with relying on the output of low/mid -level processing for high-level processing. In 1971 Winston envisioned an algorithmic framework that is a melange of subroutines -- a web of algorithms created by different research groups -- that would interact and cooperate to understand an image. This is analogous to the development of an operating system like Linux. There is no overall theory developed by a single research group that made Linux a success -- it is the body of hackers and engineers that produced a wide range of software products that make using Linux a success.

Unfortunately given the tradition of computer vision research, I believe that an open-source-style group effort in this direction will not come out of university-style research (which is overly coupled with the publishing cycle). It would be a noble effort, but would more of a feat of engineering and not science. Imagine a group of 2-3 people creating an operating system from scratch -- it seems like a crazy idea in 2010. However, computer vision research is often done in such small teams (actually there is often a single hacker behind a vision project). But maybe going open-source and allowing several decades of interaction will actually produce usable image understanding systems. I would like to one day lead such an effort -- being both the theoretical mastermind as well as the hacker behind this vision. I am an INTJ, hear me roar.