The role of metaphysics in the field of computer vision cannot be forgotten. Today I sat in my first class of the Spring 2006 semester. I'm very excited about this course, titled Advanced Robot Perception (or Advanced Machine Perception, but everybody calls it Advanced Perception anyways) which is being taught by my research advisor, Alexei Efros.
While sitting in class and listening to Alexei talk, I remembered my first day of 11th-grad high school honors english. On that first day of english class, the teacher had placed a quote on the blackboard which said, "The window through which we peer circumscribes the world we see." In some sense, this quotation represents my internal philosophy wery well. As a pragmatist -- one who's scientific outlook on life has been shaped by philosophies of Kuhn, Popper, and Rorty -- I hold on to a somewhat wishy-washy concept of truth. Perhaps I started reading Descartes when I was a bit too young, but I'm simply a sucker for Cartesian hyberbolic doubt. Perhaps I don't agree with the common man's world-view, perhaps I doubt the existence of the world outside of my head, perhaps I'm simply not willing to tell my vision system what the world is made up of.
My fascination with unsupervised techniques in computer vision is directly related to my pragmatic philosophy. Perhaps there is nothing wrong with using the objects that we have words for when we (humans) communicate, but I'm just skeptical of the fact that these high-level objects are the objects that we directly perceive. If we want to ask a vision system something about the visual world using natural language, then the vision system will clearly have to translate the english-word concept to its own internal representation of the objects it sees. However, if we want to build vision systems that can interact with the world on their own and there is no need to directly 'talk to the machines' using natural language, then why should we impose an internal structure on their internal representation. Why should we impose a 1:1 correspondence between the objects of language and the objects of perception?
Clearly, a hierarchical representation of objects is necessary since a linear structure simply does not scale to the large number of objects present in the world. I'm currently thinking about image-level primitives that could be used to construct such a hierarchical representation.
Wikipedia says it nicely, "A major concern of metaphysics is a study of the thought process itself: how we perceive, how we reason, how we communicate, how we speculate, and so on." I want to build robotic metaphysicians so that I can ask them 'what is the meaning of life?' Even though I know that the answer will be 42, I think it will be a fun journey.