Vision (a.k.a. image understanding, image interpretation, perception, object recognition) is quite unlike some of the mathematical problems we were introduced to in our youth. In fact, thinking of vision as a "mathematical problem" in the traditional sense is questionable. An important characteristic of such "problems" is that by pointing them out we already have a notion of what it would be like to solve them. Does a child think of gravity as such a problem? Well, probably not, because without the necessary mathematical backbone there is no problem with gravity! It's just the way the world works! But once a child has been perverted by mathematics and introduced into the intellectual world of science, the world ceases to just be. The world become a massive equation.
Consider the seemingly elementary problem of finding the roots of a cubic polynomial. Many of us can recite the quadratic equation by heart, but not the one for cubics (try deriving the simpler quadratic formula by hand). If we were given one evening and a whole lot of blank paper, we could try tackling this problem (no Google allowed!). While the probability of failure is quite high (and arguably most of us would fail), it would still make sense of "coming closer to the solution". Maybe we could even solve the problem when some terms are missing, etc. The important thing here is that the notion of having reached a solution is well-defined. Also, once we've found the solution it would probably be easier to convince ourselves that it is correct (verification would be easier than actually coming up with the solution).
Vision is more like theoretical physics, psychology, and philosophy and less like the well-defined math problem I described above. When dealing with the math problem described above, we know what the symbols mean, we know valid operations -- the game is already set in place. In vision, just like physics, psychology and philosophy, the notion of a fundamental operational unit (which happens to be an object for vision) isn't rigidly defined as the Platonic Ideals used throughout mathematics. We know what a circle is, we know what a real-valued variable is, but what is a "car"? Consider your mental image of a car. Now remove a wheel and ask yourself, is this still a car? Surely! But what happens as we start removing more and more elements. At what point does this object cease to be a car and become a motor, a single tire, or a piece of metal? The circle, a Platonic Ideal, ceases to become a circle once it has suffered the most trivial of all perturbations -- any deviation from perfection, and boom! the circle ceases to be a circle.
Much of Computer Vision does not ask such metaphysical questions, as objects of the real world are seamlessly mapped to abstract symbols that our mathematically-inclined PhD students love to play with. I am sad to report that this naive mapping between objects of the real world and mathematical symbols isn't so much a questions of style, it is basically the foundation of modern computer vision research. So what must be done to expand this parochial field of Vision into a mature field? Wake up and stop coding! I think Vision needs a sort of a mental coup d'état, a fresh outlook on old problem. Sometimes to make progress we have start with a clean slate -- current visionaries do not possess the right tools for this challenging enterprise. Instead of throwing higher-level mathematics at the problem, maybe we are barking up the wrong tree? However, if mathematics is the only thing we are good at, then how are we to have a mature discussion which transcends mathematics? The window through which we peer circumscribes the world we see.
I believe if we are to make progress in this challenging endeavor, we must first become Renaissance men, a sort of Neitzschean Übermensch. We must understand what has been said about perception, space, time, and the structure of the universe. We must become better historians. We must study not only more mathematics, but more physics, more psychology, read more Aristotle and Kant, build better robots, engineer stabler software, become better sculptors and painters, become more articulate orators, establish better personal relationships, etc. Once we've mastered more domains of reality, and only then, will we have a better set of tools for coping with paradoxes inherent in artificial intelligence. Because a better grasp on reality -- inching closer to enlightenment -- will result in asking more meaningful questions.
I am optimistic. But the enterprise which I've outlined will require a new type of individual, one worthy of the name Renaissance Man. We aren't interested in toy problems here, nor cute solutions. If we want to make progress, we must shape our lives and outlooks around this very fact. Two steps backwards and three steps forward. Rinse, lather, repeat.