Tuesday, November 04, 2008

Computer Vision as immature?

Ashutosh Saxena points out on his web page an interesting quote from Wikipedia about Computer Vision from August 2007. I just checked out the Wikipedia article on Computer Vision, and it seems this paragraph is still there. Parts of it go as follows:

The field of computer vision can be characterized as immature and diverse ... Consequently there is no standard formulation of "the computer vision problem." ... no standard formulation of how computer vision problems should be solved.

I agree that there is no elegant equation akin to F=ma or Schrodinger's Wave Equation that is magically supposed to explain how meaning is supposed to be attributed to images. While this might seem like a weak point, especially to the mathematically inclined always seeking to generalize and abstract away, I am skeptical of Computer Vision ever being grounded in such an all-encompassing mathematical theory.

Being a discipline centered on perception and reasoning, there is something about Computer Vision that will make it forever escape formalization. State of the art computer vision systems that operate on images can return many different types of information. Some systems return bounding boxes of all object instances from a single category, some systems break up the image into regions (segmentation) and say nothing about object classes/categories, and other systems assign a single object-level category to the entire image without performing any localization/segmentation. Aside from objects, some systems (See Hoiem et al. and Saxena et al.) return a geometric 3D layout of the scene. While it seems that humans can do extremely well at all these tasks, it makes sense that different robotic agents interacting with the real world should percieve the world differently to accomplish their own varying tasks. Thing of biological vision -- do we see the same world as dogs? Is there an objective observer-independent reality that we are supposed to see? To me, perception is very personal, and while my hardware (brain) might appear similar to another human's I'm not convinced that we see/perceive/understand the world the same way.

I can imagine ~40 years ago researchers/scientists trying to come up with an abstract theory of computation that would allow one to run arbitrary computer programs. What we have today is myriad operating systems and programming languages suited for different crowds and different applications. While the humanoid robot in our living room is nowhere to be found, I believe if we wait until that day and inspect its internal working we will not see a beautiful rigorous mathematical theory. We will see AI/mechanical components developed by different researcher groups and integrated by other researchers -- the fruits of a long engineering effort. These bots will be always learning, always updating, always getting updates, and always getting replaced by newer and better ones.