To gain a better perspective on my research regarding the Visual Memex, I spent some time reading Object Categorization: Computer and Human Vision Perspectives which contains many lovely essays on Computer Vision. This book contains recently written essays by titans of Computer Vision and contains a great deal lessons learned from history. While such a 'looking back' on vision makes for a good read, it is also worthwhile to find old works 'looking forward' and anticipating the successes and failures of the upcoming generations.
In this 'looking forward' fashion, I want to share a passage regarding image understanding systems, from "Representation and Use of Knowledge in Vision," by H. G. Barrow and J. M. Tenenbaum, July 1975. This is a short paper worth reading for both graduate students and professors interested in pushing Computer Vision research to its limits. I enjoyed the succinct and motivational ending so much, it is worth repeating it verbatim:
We conclude by reiterating some of the major premises underlying this paper:
The more knowledge the better.
The more data, the better.
Vision is a gigantic optimization problem.
Segmentation is low-level interpretation using general knowledge.
Knowledge is incrementally acquired.
Research should pursue Truth, not Efficiency.
A further decade will determine our skill as visionaries.
"The more knowledge the better" to the some extent..ReplyDelete
What about computational time??..
Unfortunately current state-of-the-art in computer vision is so far away from human abilities that getting good performance is more important than computational time.ReplyDelete
If one is dealing with a mobile robot, it seems as if computational time is important.
I foresee a future where performing object recognition in an image will be seen as expensive as performing a Google query in 2010. How many machines are being use in this case is not clear. But the fact that the computation is distributed will be important.
Could one imagine a robot performing 1000s of Google queries per second? I think so.
I like your optimism :). It was 1975, and we still have same problems. 35 years! Computational power has been grown in hundreds times but we have not moved so far.ReplyDelete
We haven't moved closer to producing useful Image Understanding technology, but I think this is to be expected.ReplyDelete
Given that philosophers have been arguing about the foundations of knowledge for thousands of years, why should we expect Computer Vision scientists to make significant progress in such a short amount of time.
Is vision any less intellectual than Philosophy or Theoretical Physics? I don't think it is. I personally put Vision, at its core, on the highest pedestal for good reasons. There are many deep problems in Vision, but the problem is that the researchers do not have diverse enough backgrounds to even start asking the right questions.
This is not a problem that will take one PhD, two PhDs, nor a single lifetime to crack. While I expect pragmatic solutions to start appearing in my lifetime, machines will not surpass us anytime soon.
I've been reading your blog for quite sometime. I am currently doing my MS in Computer Vision. I happen to agree with a lot of your criticism about the state of research in computer vision, especially about how some of the research is currently being done and the need of broad understanding of the problem from various deciplines. Keep up your good work!ReplyDelete
I'm glad you enjoy reading about the different perspective on vision I've discussed in my blog!ReplyDelete