"All bodies together, and each by itself, give off to the surrounding air an infinite number of images which are all-pervading and each complete, each conveying the nature, colour and form of the body which produces it." --Leonardo da Vinci
Yesterday, Edward H. Adelson ("Ted Adelson") gave a lecture at MIT on the plenoptic function and its role in understanding (and unifying) early vision. Ted has been at MIT for quite some time. He is sometimes described as being (1/3 human vision, 1/3 computer vision, and 1/3 computer graphics) and was Bill Freeman's advisor.
What is the plenoptic function?
Etymology: Plenoptic comes from plenus+optic.plenus: full, filled
optic: relating to eye or vision
Ted Adelson imagined a sort of unified field theory for vision -- instead of proposing a jungle of atoms such as edges, corners, and peaks, the plenoptic function offers a unifying principle under which color, texture, motion, etc. can all be viewed as gradients of the plenoptic function. The plenoptic function is a complete representation which contains, implicitly, a description of every possible photograph that could be taken of a particular space-time chunk of the world. Omniscience is to knowing as the plenoptic function is to seeing.
Ted remarked that if you asked him 20 years ago what he was working on in vision, you might have gotten a confusing answer. "Do you work on texture, motion, stereo, or illumination?" you might ask. "All of them. Aren't they the all the same thing?" he might reply. Ted argues that vision scientists in the 80s and early 90s tried to cut up the world of vision into neat little "particles" and would develop theories with their favorite particle -- here the particles are early vision concepts such as color, texture, and motion.
In their seminal paper on the plenoptic function, The Plenoptic Function and the Elements of Early Vision, Adelson and Bergen state that "the elemental operations of early vision involve the measurement of local change long various directions within the plenoptic function." As a theoretical device, the plenoptic function has left a long-standing impression on me. I first came across Ted's ideas back in 2006 -- thanks to Alyosha Efros' course on vision. Having just completed a BS in Physics, I was well aware of unified field theories in physics, and the plenoptic function seemed too cool to forget.
What the plenoptic function means to me
However, if the plenoptic function is the Maxwell's equations equivalent for early (low-level) vision, then what I'm ultimately after is the Schrodinger's equation of late (high-level) vision. In his lecture, Ted Adelson acknowledged that vision scientists have a sort of Atom Envy -- they envy the physicists who are able to understand the world in terms of a few fundamental ontologically meaningful entities. First of all, I like particles, but I have no apriori reason to be in the particle camp all of my life. Secondly, the plenoptic function was all about early vision, but my research in vision is all about high-level vision such as object recognition. I might be young and foolish, but the search for a "mind mechanics" has been a part of my research life (at least partially) since ~2003. Right now, my best shot at an answer is that exemplars and associations are the basic building blocks of high-level vision -- but unlike the British Empricisits (the champions of associationism), I would argue that the atomic building blocks of associations are object instances, and not ideas such as "roundness" and "blueness". Complex ideas are then the object categories which arise out of the interactions between these concrete elements of experience.
Conclusion
The Adelson and Bergen paper is a must read for anybody serious about vision. While it might not offer much in terms of "what next" in vision research, it is nevertheless a useful construct in thinking about vision. I get excited when it comes down to unifying principles and I wish there were more papers like this in vision, especially for high-level vision.
No comments:
Post a Comment