
Kanade acknowledges that the flow of a vision algorithm is very much dependent on the representation used. For image understanding, bottom-up as well as top-down processing will both be critical components of the entire system. However the exact strategy for combining these processes, in addition to countless other mid-level stages, is not very clear. Directly quoting Kanade, "The ultimate style would be a heterarchy, in which a number of modules work together like a community of experts with no strict central executive control." According to this line of thought, processing would occur in a loopy and cooperative style. Kanade attributes the concept of a heterarchy to Patrick Winston who worked with robots in the golden days of AI at MIT. Like Kanade, Winston criticizes a linear flow of information in scene interpretation (this criticism dates back to 1971). The basic problem outlined by both Kanade and Winston is that modules such as line-finders and region-finders (think segmentation) are simply not good enough to be used in subsequent stages of understanding. In my own research I have used the concept of multiple image segmentations to bypass some of the issued with relying on the output of low/mid -level processing for high-level processing. In 1971 Winston envisioned an algorithmic framework that is a melange of subroutines -- a web of algorithms created by different research groups -- that would interact and cooperate to understand an image. This is analogous to the development of an operating system like Linux. There is no overall theory developed by a single research group that made Linux a success -- it is the body of hackers and engineers that produced a wide range of software products that make using Linux a success.
Unfortunately given the tradition of computer vision research, I believe that an open-source-style group effort in this direction will not come out of university-style research (which is overly coupled with the publishing cycle). It would be a noble effort, but would more of a feat of engineering and not science. Imagine a group of 2-3 people creating an operating system from scratch -- it seems like a crazy idea in 2010. However, computer vision research is often done in such small teams (actually there is often a single hacker behind a vision project). But maybe going open-source and allowing several decades of interaction will actually produce usable image understanding systems. I would like to one day lead such an effort -- being both the theoretical mastermind as well as the hacker behind this vision. I am an INTJ, hear me roar.
No comments:
Post a Comment