Tuesday, August 16, 2011

Question: What makes an object recognition system great?

Today, instead of discussing my own perspectives on object recognition or sharing some useful links, I would like to ask a general question geared towards anybody working in the field of computer vision:

What makes an object recognition system great?

In particular, I would like to hear a broad range of perspectives regarding what is necessary to provide an impact-creating open-source object recognition system for the research community to use.  As a graduate student you might be interested in building your own recognition system, as a researcher you might be interested in extending or comparing against a current system, and as an educator you might want to to direct your students to a fully-functional object recognition system which could be used to bootstrap their research.



To start the discussion I would like to first enumerate a few elements which I find important in making an object recognition system great.

Open Source
In order for object recognition to progress, I think releasing binary executables is simply not enough.  Allowing others to see your source code means that you gain more scientific credibility and you let others extend your system -- this means letting others both train and test variants of your system. More people using an object recognition system also translates to a high citation count, which is favorable for researchers seeking career advancement.  Felzenszwalb et al. have released multiple open-source version of their Discriminatively Trained Deformable Part Model -- each time we see a new release it gets better!  Such continual development means that we know the authors really care about this problem.  I feel Github, with its distributed version control and social-coding features, is a powerful took the community should adopt, something which I believe is very much needed to take the community's ideas to the next level.  In my own research (e.g., the Ensemble of Exemplar-SVMs approach), I have started using Github (for both private and public development) and I love it. Linux might have been started by a single individual, but it took a community to make it great.  Just look at where Linux is now.

Ease of use
For ease of use, it is important that the system is implemented in a popular language which is known by a large fraction of the vision community.  Matlab, Python, C++, and Java are such popular language and many good implementations are a combination of Matlab with some highly-optimized routines in C++.  Good documentation is also important since one cannot expect only experts to be using such a system.

Strong research results
The YaRS approach, which is the "yet-another-recognition-system" approach, doesn't translate to high usage unless the system actually performs well on a well-accepted object recognition task.  Every year at vision conferences, many new recognition frameworks are introduced, but really only a few of them ever pass the test of time.  Usually an ideas withstands time because it is a conceptual contribution to science, but systems such as the HOG-based pedestrian detector of Dalal-Triggs and the Latent Deformable Part Model of Felzenszwalb et al. are actually being used by many other researchers.  The ideas in these works are not only good, but the recognition systems are great.

Question:
So what would you like to see in the next generation of object recognition systems?  I will try my best to reply to any comments posted below.  Any really great comment might even trigger a significant discussion; enough to warrant its own blog post.  Anybody is welcome to comment/argue/speculate below, either using their real name or anonymously.