Friday, May 12, 2006

the killer app of computer vision

What is the killer application of computer vision? In other words, how useful are machines that can visually detect objects in images?

The easiest application to think of is image retrieval. For this application a user specifies either an image or some text, and the system returns new images that are somehow related to the input. In addition, the resulting images also come with some type of information that relates them to the input. Surely companies like Google would be interested in such applications, but isn't there more that we could get out of computer vision?

When I was younger I was very interested in particle physics, and I even finished my undergrad with a dual degree in Computer Science and Physics. I was impressed with the way that computational techniques could be used to 'get at' the world. Large-scale simulations and data analysis could be used to infer the structure of the world (or at least given some structure to fit the necessary parameters).

Could we train machines that can infer relationships between objects in the world? Can a machine infer Newtonian-like properties (and thus establish a metaphysics) of the world such as mass and gravity from visual observations? I think the big questions here is the folllowing: can we train machines to 'see' objects without those machines first understanding any properties of the dynamic world? When I say 'properties of the dynamic world,' I do not mean appearance variations, but things like 'objects have mass and things with mass do not just float in ambient space,' and 'things in motion tend to stay in motion.'