Tombone's Computer Vision Blog: knowledge

Showing posts with label knowledge. Show all posts

Wednesday, August 25, 2010

Multifaceted Knowledge Representation: Ideas from Marvin Minsky

"I think a key to AI is the need for several representations of the knowledge, such that when the system is stuck (using one representation) it can jump to use another. When David Marr at MIT moved into computer vision, he generated a lot of excitement, but he hit up against the problem of knowledge representation; he had no good representations for knowledge in his vision systems." -- Marvin Minsky

Check out the full interview with Marvin Minsky here -- a must read for anybody serious about building intelligent machines! This interview appears to be a part of a larger volume: Hal's Legacy.

I believe that in order to make the enterprise of computer vision of success, we must seriously broaden our outlook on the problem. Are we seriously expecting algorithms to delineate object boundaries from real images based on statistics of patch descriptors without any sort of model of the world?

I don't know about you, but I seriously want to build intelligent machines. I don't think there will ever be any sort of low-level SIFT-esque algorithm that "solves vision." It is a much grander picture of intelligence that I'm really after -- and successful computer vision will be a result(component?) of a higher-level intelligent machine. Machines need to know about a whole lot more than is found in a single image -- and the necessary conceptual tools might not be present in the computer vision community.

A recurring theme in my blog is my belief that we must become renaissance men -- a unison of *nix hackers, vision scientists, cognitive scientists, philosophers, athletes, machine learning scientists, skilled orators, and much more -- if we are to have any hope of chiseling away at the problem of computational intelligence. Minsky was a pioneer of computational intelligence, and his words revitalize my own research efforts in this direction.

Sunday, June 13, 2010

everything is misc -- torralba cvpr paper to check out

Weinberger's Everything is Miscellaneous is a delightful read -- I just finished it today while flying from PIT to SFO. It was recommended to me by my PhD advisor, Alyosha, and now I can see why! Many of the key motivations behind my current research on object representation deeply resonate in Weinberger's book.

Weinberger motivates Rosch's theory of categorization (the Prototype Model), and explains how it is a significant break from the thousand years of Aristotelian thought. Aristotle gave us the notion of a category -- centered around the notion of a definition. For Aristotle, every object can be stripped to its essential core, and place in its proper place in a God's-eye objective organization of the world. It was Rosch who showed us that categories are much fuzzier and more hectic than suggested by the rigid Aristotelian system. Just like Copernicus single-handedly stopped the Sun and set the Earth in motion, Rosch disintegrated our neatly organized world-view and demonstrated how an individual's path through life shapes h/er concepts.

I think it is fair to say that my own ideas as well as Weinberger's aren't so much an extension of the Roschian mode of thought, but also a significant break from the entire category-based way of thinking. Given that Rosch studied Wittgenstein as a student, I'm surprised her stance wasn't more extreme, more along the anti-category line of thought. I don't want to undermine her contribution to psychology and computer science in any way, and I want to be clear that she should only be lauded for her remarkable research. Perhaps Wittgenstein was as extreme and iconoclastic as I like my philosophers to be, but Rosch provided us with a computational theory and not just a philosophical lecture.

From my limited expertise in theories of categorization in the field of Psychology, whether it is Prototype Models or the more recent data-driven Exemplar Models, these theories are still theories of categories. Whether the similarity computations are between prototypes and stimuli, or between exemplars and stimuli, the output of a categorization model is still a category. Weinberger is all about modern data-driven notions of knowledge organization, in a way that breaks free from the imprisoning notion of a category. Knowledge is power, so why imprison it in rigid modules called categories? Below is a toy visualization of a web of concepts, as imagined by me. This is very much the web-based view of the world. Wikipedia is a bunch of pages and links.

Artistic rendition of a "web of concepts"

I found it valuable to think of the Visual Memex, the model I'm developing in my thesis research, as an anti-categorization model of knowledge -- a vast network of object-object relationships. The idea of using little concrete bits of information to create a rich non-parametric web is the recurring theme in Weinberger's book. In my case, the problem of extracting primitives from images, and all of the problem in dealing with real-world images are around to plague me, and the Visual Memex must rely on many Computer Vision techniques -- such things are not discussed in Weinberger's book. The "perception" or "segmentation" component of the Visual Memex is not trivial -- where linking words on the web is much easier.

CVPR paper to look out for

However, the category-based view is all around us. I expect most of this year's CVPR papers to fit in this category-based view of the world. One paper, co-authored by the great Torralba, looks relevant to my interests. It is yet another triumph for the category-based mentality in computer vision. In fact, one of the figures in the paper demonstrates the category-based view of the world very well. Unlike the memex, the organization is explicit in the following figure:

Exploiting Hierarchical Context on a Large Database of Object Categories

Exploiting Hierarchical Context on a Large Database of Object Categories

Myung Jin Choi, Joseph Lim, Antonio Torralba, and Alan S. Willsky. CVPR 2010.

Friday, March 05, 2010

Representation and Use of Knowledge in Vision: Barrow and Tenenbaum's Conclusion

To gain a better perspective on my research regarding the Visual Memex, I spent some time reading Object Categorization: Computer and Human Vision Perspectives which contains many lovely essays on Computer Vision. This book contains recently written essays by titans of Computer Vision and contains a great deal lessons learned from history. While such a 'looking back' on vision makes for a good read, it is also worthwhile to find old works 'looking forward' and anticipating the successes and failures of the upcoming generations.

In this 'looking forward' fashion, I want to share a passage regarding image understanding systems, from "Representation and Use of Knowledge in Vision," by H. G. Barrow and J. M. Tenenbaum, July 1975. This is a short paper worth reading for both graduate students and professors interested in pushing Computer Vision research to its limits. I enjoyed the succinct and motivational ending so much, it is worth repeating it verbatim:

--------

III Conclusion

We conclude by reiterating some of the major premises underlying this paper:

The more knowledge the better.
The more data, the better.
Vision is a gigantic optimization problem.
Segmentation is low-level interpretation using general knowledge.
Knowledge is incrementally acquired.
Research should pursue Truth, not Efficiency.

A further decade will determine our skill as visionaries.

-------------