Showing posts with label python. Show all posts
Showing posts with label python. Show all posts

Thursday, November 27, 2014

Barcodes: Realtime Training and Detection with VMX

In this VMX screencast, witness the creation of a visual barcode detection program in under 9 minutes. You can see the entire training procedure -- creating an initial data set of labeled barcodes, improving the detector via a 5 minute interactive learning step, and finishing off with a qualitative evaluation of the trained barcode detector.


The inspiration came after reading Dr. Rosebrock's blog post on detecting barcodes using OpenCV and Python (http://www.pyimagesearch.com/2014/11/24/detecting-barcodes-images-python-opencv/).  While the code presented in Rosebrock's blog post is quite simple, it is most definitely domain-specific.  Different domain-specific programs must be constructed for different objects.  In other words, different kinds of morphological operations, features, and thresholds must be used for detecting different objects and it is not even clear how you would construct the rules to detect a complex object such as a "monkey."  If you are just getting started with programming and want to learn how to construct some of these domain-specific programs, you're just going to have to subscribe to http://www.pyimagesearch.com/.

Writing these kinds of vision programs is hard.  Unless... you address the problem with some advanced machine learning techniques.  Applying machine learning to visual problems is "the backbone" of what we do at vision.ai and computer vision research has been a personal passion of mine for over a decade.  So I decided to take our most recent piece of vision tech for a spin.  We try not to code while on vacation (a good team needs good rest), and I don't consider using our GUI-based VMX software as hardcore as "coding."  Unlike traditional vision systems whose operation might leave you with an engineering-hangover, using VMX is more akin to playing Minecraft.  I figured that playing a video game or two on vacation is permissible.

Eliminating the residual sunscreen from my hands, I rebooted my soul with an iced gulp of Spice Isle Coffee and fired up my trusty Macbook Pro.  I then grabbed the first few vacation-themed objects from the kitchen. (And yes, I'm on vacation for Thanksgiving -- the objects include canned fruit, sunscreen, and a bottle of booze.)  Then it was time to throw the barcode detection problem at VMX.

Step 1: Barcode Initial Selections
30 seconds worth of initial clicks followed by several minutes worth of waving objects in front of the webcam is not hard work.  5 minutes later we have a sexy barcode detector.  Not too bad for computer vision in a non-laboratory setting.  While on vacation, I don't have access to a lab and neither should you.  A sun-filled patio will have to suffice.  In fact, it was so bright outside that I had to wear sunglasses the entire time. (Towards the end of the video, a "sunglasses" detector makes a cameo.)

Please note that he barcode is not actually "read" (so this program can't tell whether the region corresponds to canned pineapples or sunscreen), the region of interest is simply detected and tracked in real-time.

Final Step: Tweaking Learned Positives and Negatives
This video is an example of a pure machine-learning based approach to barcode detection.  The underlying algorithm can be used to learn just about any visual concept you're interested in detecting.  A bar code is just like a face or a car -- it is a 2D pattern which can be recognized by machines.  Throughout my career I've trained thousands of detectors (mostly in an academic setting).  VMX is the most fun with object recognition I've ever had and it lets me train detectors without having to worry about the mathematical details.  Once you get your own copy of VMX, what will you train?

To learn how to get your hands on VMX, sign up on the mailing list at http://vision.ai or if you're daring enough, you can purchase an early beta license key from https://beta.vision.ai.

So what's next?  Should I build a boat detector? Maybe I should train a detector to let me know when I run low on Spice Isle Coffee? Or how about going on a field trip and counting bikinis on the beach?

Tuesday, August 16, 2011

Question: What makes an object recognition system great?

Today, instead of discussing my own perspectives on object recognition or sharing some useful links, I would like to ask a general question geared towards anybody working in the field of computer vision:

What makes an object recognition system great?

In particular, I would like to hear a broad range of perspectives regarding what is necessary to provide an impact-creating open-source object recognition system for the research community to use.  As a graduate student you might be interested in building your own recognition system, as a researcher you might be interested in extending or comparing against a current system, and as an educator you might want to to direct your students to a fully-functional object recognition system which could be used to bootstrap their research.



To start the discussion I would like to first enumerate a few elements which I find important in making an object recognition system great.

Open Source
In order for object recognition to progress, I think releasing binary executables is simply not enough.  Allowing others to see your source code means that you gain more scientific credibility and you let others extend your system -- this means letting others both train and test variants of your system. More people using an object recognition system also translates to a high citation count, which is favorable for researchers seeking career advancement.  Felzenszwalb et al. have released multiple open-source version of their Discriminatively Trained Deformable Part Model -- each time we see a new release it gets better!  Such continual development means that we know the authors really care about this problem.  I feel Github, with its distributed version control and social-coding features, is a powerful took the community should adopt, something which I believe is very much needed to take the community's ideas to the next level.  In my own research (e.g., the Ensemble of Exemplar-SVMs approach), I have started using Github (for both private and public development) and I love it. Linux might have been started by a single individual, but it took a community to make it great.  Just look at where Linux is now.

Ease of use
For ease of use, it is important that the system is implemented in a popular language which is known by a large fraction of the vision community.  Matlab, Python, C++, and Java are such popular language and many good implementations are a combination of Matlab with some highly-optimized routines in C++.  Good documentation is also important since one cannot expect only experts to be using such a system.

Strong research results
The YaRS approach, which is the "yet-another-recognition-system" approach, doesn't translate to high usage unless the system actually performs well on a well-accepted object recognition task.  Every year at vision conferences, many new recognition frameworks are introduced, but really only a few of them ever pass the test of time.  Usually an ideas withstands time because it is a conceptual contribution to science, but systems such as the HOG-based pedestrian detector of Dalal-Triggs and the Latent Deformable Part Model of Felzenszwalb et al. are actually being used by many other researchers.  The ideas in these works are not only good, but the recognition systems are great.

Question:
So what would you like to see in the next generation of object recognition systems?  I will try my best to reply to any comments posted below.  Any really great comment might even trigger a significant discussion; enough to warrant its own blog post.  Anybody is welcome to comment/argue/speculate below, either using their real name or anonymously.