Tombone's Computer Vision Blog: 2014

Thursday, November 27, 2014

Barcodes: Realtime Training and Detection with VMX

In this VMX screencast, witness the creation of a visual barcode detection program in under 9 minutes. You can see the entire training procedure -- creating an initial data set of labeled barcodes, improving the detector via a 5 minute interactive learning step, and finishing off with a qualitative evaluation of the trained barcode detector.

The inspiration came after reading Dr. Rosebrock's blog post on detecting barcodes using OpenCV and Python (http://www.pyimagesearch.com/2014/11/24/detecting-barcodes-images-python-opencv/). While the code presented in Rosebrock's blog post is quite simple, it is most definitely domain-specific. Different domain-specific programs must be constructed for different objects. In other words, different kinds of morphological operations, features, and thresholds must be used for detecting different objects and it is not even clear how you would construct the rules to detect a complex object such as a "monkey." If you are just getting started with programming and want to learn how to construct some of these domain-specific programs, you're just going to have to subscribe to http://www.pyimagesearch.com/.

Writing these kinds of vision programs is hard. Unless... you address the problem with some advanced machine learning techniques. Applying machine learning to visual problems is "the backbone" of what we do at vision.ai and computer vision research has been a personal passion of mine for over a decade. So I decided to take our most recent piece of vision tech for a spin. We try not to code while on vacation (a good team needs good rest), and I don't consider using our GUI-based VMX software as hardcore as "coding." Unlike traditional vision systems whose operation might leave you with an engineering-hangover, using VMX is more akin to playing Minecraft. I figured that playing a video game or two on vacation is permissible.

Eliminating the residual sunscreen from my hands, I rebooted my soul with an iced gulp of Spice Isle Coffee and fired up my trusty Macbook Pro. I then grabbed the first few vacation-themed objects from the kitchen. (And yes, I'm on vacation for Thanksgiving -- the objects include canned fruit, sunscreen, and a bottle of booze.) Then it was time to throw the barcode detection problem at VMX.

Step 1: Barcode Initial Selections

30 seconds worth of initial clicks followed by several minutes worth of waving objects in front of the webcam is not hard work. 5 minutes later we have a sexy barcode detector. Not too bad for computer vision in a non-laboratory setting. While on vacation, I don't have access to a lab and neither should you. A sun-filled patio will have to suffice. In fact, it was so bright outside that I had to wear sunglasses the entire time. (Towards the end of the video, a "sunglasses" detector makes a cameo.)

Please note that he barcode is not actually "read" (so this program can't tell whether the region corresponds to canned pineapples or sunscreen), the region of interest is simply detected and tracked in real-time.

Final Step: Tweaking Learned Positives and Negatives

This video is an example of a pure machine-learning based approach to barcode detection. The underlying algorithm can be used to learn just about any visual concept you're interested in detecting. A bar code is just like a face or a car -- it is a 2D pattern which can be recognized by machines. Throughout my career I've trained thousands of detectors (mostly in an academic setting). VMX is the most fun with object recognition I've ever had and it lets me train detectors without having to worry about the mathematical details. Once you get your own copy of VMX, what will you train?

To learn how to get your hands on VMX, sign up on the mailing list at http://vision.ai or if you're daring enough, you can purchase an early beta license key from https://beta.vision.ai.

So what's next? Should I build a boat detector? Maybe I should train a detector to let me know when I run low on Spice Isle Coffee? Or how about going on a field trip and counting bikinis on the beach?

Sunday, October 26, 2014

VMX is ready

I haven't posted anything here in the last few months, so let me give you guys a brief update. VMX has matured since the Prototype stage last year and the vision.ai team has already started circulating some beta versions of our software.

For those of you who don't remember, last year I decided to leave my super-academic life at MIT and go the startup-route focusing on vision, learning, and automation. Our goal is to make building and deploying vision applications as easy as pie. We want to be the Heroku of computer vision. Personally, I've always wanted to expose the magic of vision to a broader audience. I don't know if the robots of the future are going to have two legs, four arms, or they will forever be airborne -- but I can tell you that these creatures are going to have to perceive the world around them. 2014 is not a bad place to be for a vision company.

VMX, the suite of vision and automation tools which we showcased last year in our Kickstarter campaign, is going live very soon. VMX will be vision.ai's first product. While VMX doesn't do everything vision-related (there's OpenCV for that), it makes training visual object detectors really easy. Whether you're just starting out with vision or AI, have a killer vision-app idea, want to automate more things in your home, you're gonna want to experience VMX yourself.

We will be providing a native installer for Mac OS X as well as single command installer for Linux machines based on Docker. VMX will run on your machine without an internet connection (the download plus all dependencies plus all necessary pre-trained files is approximately 2GB and an activation license will cost between $100 and $1000). The VMX App Builder runs in your browser, is built in AngularJS, and our REST API will allow you to write your own scripts/apps in any language you like. We even have lots of command line examples if you're a curl-y kind of guy/gal. If there's sufficient demand, we'll work on a native Windows installer.

We have been letting some of our close friends and colleagues beta-test our software and we're confident you're going to love it. If you would like to beta-test our software, please sign up on the vision.ai mailing list and send us a beta-key request. We have a limited number of beta-testing keys, so I'm sorry if we don't get back to you. If you want a hands-on demo by one of the VMX creators, we are more than happy to take a hacking break and show off some VMX magic. We can be found in Boston, MA and/or Burlington, VT. If you're thinking of competing in a Hackathon near one of our offices, drop us a line, we'll try to send a vision.ai jedi your way.

Geoff has been championing Docker for the last year and he's done amazing things Dockerizing our build pipeline while I refactored the vision engine API using some ideas I picked up from Haskell, and made considerable performance tweaks to the underlying learning algorithm. I spent a few months toying with different deep network representations, and modernized the internal representation so I can find another deep learning guru to help us out with R&D in 2015.

4 VMXserver processes running on Macbok Pro

We're going to release plenty of pre-trained models plus all the tools and video tutorials you'll need to create your own models from scratch.

We will be offering a $100 personal license and a $1000 professional license of VMX. Beta testers get a personal license in return for helping find installation bugs. Internally, we are at version 0.1.3 of VMX and once we attain 90%+ code coverage we will have VMX 1.0 sometime in early 2015. We typically release stable versions every 1 months and bleeding edge development builds every week.

The future of vision.ai

In the upcoming months, we'll be perfecting our cloud-based deployment platform, so if you're interested in building on top of our vision.ai infrastructure or want to have fun running some massively parallel vision computations with us, just shoot us an email.

Monday, January 20, 2014

Sponsor Your Favorite Object Detector + VMX Smile Detector

Many of you asked if the VMX Project will come with an initial set of object detectors. Yes! VMX will come equipped with a library of pre-trained object detectors. We are committed to providing you with an amazing VMX computer vision experience and want to give you as much as possible when you start using VMX.

Today, we’d like to introduce a special “sponsor your favorite object detector” reward. We’re introducing a new $300 pledge level to our Kickstarer page, one which lets you sponsor a object detector that will be come inside the VMX pre-trained object library. In addition to sponsorship, you will obtain all the other perks of being a $300 level backer: 650 Compute Hours, a local VMX install, early-access, the VMX cookbook, and VMX developer status. By sponsoring a detector, your name will appear inside the model library when a VMX user mouses over your favorite detector. This is your chance to make a pledge which will have an ever-lasting effect on our project. Consider the number of people that at some point use a generic car detector! Each time they visit the VMX model library, you will have your own claim to fame. “Look mom, I sponsored the car detector!”

We have 100 slots for the $300 “sponsor an object detector reward,” and the name of the backer sponsoring the an object detector will appear as you mouseover the object model in the model library. This way, your name will be inside the VMX webapp model library, in addition to the wall of backers on our company page. You will be able to choose your name, your best friend’s name, your twitter handle (such as @quantombone), or your nickname. Sorry, no profanity allowed.

We will release the list of 100 object detectors which will come with VMX at the end of January. Sponsors will get the chance to choose their object detectors on a first-come-first-serve basis. If you are the first one to become a sponsor, you will get to choose “face,” “car,” “guitar” or whatever other object you might be excited about! As always, you can change your pledge level and reward. So act now and don’t forget that by sponsoring an object detector you are supporting our dream project come to life!

And for those of you interested in seeing more VMX action shots, here's a new video showing off VMX detecting smiles. This one was taken with Tom's iPhone because the screencapture software on his computer slows everything down. No post-processing, this is as fast as the prototype runs. Enjoy!

(Cross-posted from the VMX Project Kickstarter Update #10)

Tuesday, January 14, 2014

10% of our Kickstarter campaign total will go to free High School Student technology licenses

Dear Kickstarters, technology enthusiasts, and STEM educators,

We’re happy to announce a new reward in our Kickstarter project, one designed for free access of our robotic vision technology to high school students. If we reach our Kickstarter campaign milestone of $100K, we will give 10% of Kickstarter generated funds to high school students and clubs in the form of software licenses. $100K raised will translate to 100 single-machine VMX licenses given out to 100 different high schools and clubs during the Summer of 2014, free of charge. Optionally, qualifying high schools can choose to claim 100 VMX Compute hours if they have a problem with local performance, don’t have access to a Linux machine, and/or their security policy doesn’t allow virtual machines.

Our Kickstarter project, the VMX Project, is an easy-to-use and fully trainable computer vision programming environment. With VMX, you can teach your computer to recognize objects using the webcam. We’ve already surpassed the 30% funding milestone and generated lots of great ideas from our community. Ideas ranging from medical disease diagnosis and 3D object reconstruction to smart wine inventory management. By bringing a computer vision app-building environment to students, we’re excited about the prospect of giving teens a sandbox for innovation -- an ecosystem to achieve their own technology-oriented Eureka moments. So whether a student decides to study computer science in college or comes up with the next great startup idea, we want to give them a headache-free entry to the world of computer vision.

If you want to learn more about the VMX Project, please see our Kickstarter page:

http://www.kickstarter.com/projects/visionai/vmx-project-computer-vision-for-everyone

The VMX High School Program is designed to give a limited number of students and student clubs free access to VMX in-browser object recognition technology. We understand that “Computer Vision for Everyone” needs to include a broader range of individuals, individuals with little or no spending income. We’re committed to letting those who can be most influenced by new technology, the young innovators inside our classrooms, get access to our technology.

By supporting our Kickstarter campaign, you are backing our vision of bringing computer vision technology to the masses. So whether you want VMX for your own creative use or want to give something to your community, we hope you’ll appreciate our new VMX Project High School Program reward and back our project. In addition, backers of our project will be able to donate any of their unused Compute Hours into the Eureka fund so that additional high school students get access to our technology.

If you are a high school student or high school teacher and would like get some cool computer vision technology for your school, please send an email to “admin@vision.ai” with “VMX Project High School Program” in the title, briefly describing what you’d like to do with VMX, your age, and school name. To generate interest among your students and friends, share our VMX Kickstarter video with your classroom and have one of your students email us with their idea.

Kickstarter is all-or-nothing, so we need to reach the $100K funding milestone to make this project a reality.

We are excited that as software developers, our creations have the potential to spread rapidly. But we want to make sure that one of valuable demographics, creative high school students, isn’t left-behind. Help spread the word about VMX using social networking and let’s make 2014 the year of new technology by bringing computer vision technology to the masses.

Sincerely,

Tomasz Malisiewicz, PhD

Co-Founder of vision.ai

(Cross-posted from VMX Project kickstarter blog Entry #8)

Sunday, January 12, 2014

Can a person-specific face recognition algorithm be used to determine a person's race?

It's a valid question: can a person-specific face recognition algorithm be used to determine a person's race?

I trained two separate person-specific face detectors. For each detector I used videos of the target person's face to generate positive examples and faces from [google image search for "faces"] as negative examples. This is a fairly straightforward machine learning problem: find a decision boundary between the positive examples and the negative examples. I used the VMX Project recognition algorithm which learns from videos with minimal human supervision. In both cases, I used the VMX webapp for training (training each detector took about ~20 minutes from scratch). In fact, I didn't even have to touch the command line. Since videos were used an input, what I created are essentially full-blown sliding window detectors, meaning that they scan an entire image and can even find small faces. I then ran this detector on the large average male face image. This average face image has been around the internet for a while now and it was created by averaging people's faces. By running the algorithm on this one image, it analyzed all of the faces contained inside and I was able to see which country returned the highest scoring detection!

Experiment #1
For the first experiment, I used a video of my own face. Because I was using a live video stream, I was able to move my face around so that the algorithm saw lots of different viewing conditions. Here is a the output. Notice the green box around "Poland." Pretty good guess, especially since I moved from Poland to the US when I was 8.

Here is a 5 min video (VMX screencapture) of me running the "Tomasz" (that's my name in case you don't know) detector as I fly around the average male image. You can see the scores on lots of different races. High scoring detections are almost always on geographically relevant races.

Experiment #2
For the second target, I used a few videos of Andrew Ng to get positives. For those of you who don't know, Andrew Ng is a machine learning researcher, entrepreneur, professor at Stanford, and MOOC visionary. Here is the result. Notice the green box around "Japan." Very reasonable answer -- especially since I didn't give the algorithm an extra Asian faces for negatives.

Here is a 5 min video (VMX screencapture) of me running the "Andrew Ng" detector as I fly around the average male image.

In conclusion, person-specific face detectors from VMX can be used to help determine a person's race. At least the two VMX face detectors I trained behaved as expected. This is far from a full-out study, but I only had the chance to try out on two subjects and wanted to share what I found. The underlying algorithm inside VMX is a non-parametric exemplar-based model. During training the algorithm uses ideas from max-margin learning to create a separator between the positives and negatives.

If you've been following up on my computer vision research projects, you should have a good idea of how these things work. I want to mention that while I showcase VMX being used for face detection, there is nothing face-specific inside the algorithm. The same representation is used for bottles, cars, hands, mouths, etc. VMX is a general purpose object recognition ecosystem and we're excited to finally be releasing this technology to the world.

There are lots of cool applications of VMX detectors. What app will you build?

To learn more about VMX and get-in on the action, simply checkout the VMX Kickstarter project and back our campaign.

Tuesday, January 07, 2014

Tracking points in a live camera feed: A behind-the-scenes look at the VMX Project webapp

In our computer vision startup, vision.ai, we're using open-source tools to create a one-of-a-kind object recognition experience. Our goal is to make state-of-the-art visual object recognition as easy as waving an object in front of your laptop's or smartphone's camera. We've made a webapp and programming environment called VMX that allows you to teach your computer about objects without any advanced programming, nor any bulky software installations -- you'll finally be able to put your computer's new visual reasoning abilities to good use. Today's blog post is about some of the underlying technology that we used to build the VMX prototype. (To learn about the entire project and how you can help, please visit VMX Project on Kickstarter.)

The VMX project utilizes many different programming languages and technologies. Many of the behind-the-scenes machine learning algorithms have been developed in our lab, but to make a good product it takes more than just robust backed algorithms. On the front-end, the two key open source (MIT licensed) projects we rely on are AngularJS and JSFeat. AngularJS is an open-source JavaScript framework, maintained by Google, that assists with running single-page applications. Today's focus will be on JSFeat, the Javascript Computer Vision Library we use inside the front-end webapp. What is JSFeat? Quoting Eugene Zatepyakin, the author of JSFeat, "The project aim is to explore JS/HTML5 possibilities using modern & state-of-art computer vision algorithms."

We use the JSFeat library to track points inside the video stream. Below is a YouTube video of our webapp in action, where we enabled the "debug display" to show you what is happening to tracked points behind the scenes. The blue points are being tracked inside the browser, the green box is the output of our object detection service (already trained on my face), and the black box is the interpolated result which integrates the backend service and the frontend tracker.

The tracker calculates an optical flow for a sparse feature set using the iterative Lucas-Kanade method with pyramids. The algorithm basically looks at two consecutive video frames and determines how points move by using a straightforward least-squares optimization method. The Lucas-Kanade algorithm is a classic in the computer vision community -- to learn more see the Lucas-Kanade Wikipedia page or take a graduate level computer vision course. Alternatively, if you find me on the street and ask nicely, I might give you an impromptu lecture on optical flow.

Instead of using interest points, in our prototype video we used a regularly spaced grid of points covering the entire video stream. This grid gets re-initialized every N seconds. It avoids the extra expense of finding interest points inside every frame. NOTE: inside our vision.ai computer vision lab, we are incessantly experimenting with better ways of integrating point tracks with strong object detector results. What you're seeing is just an early snapshot of the technology in action.

To play with a Lucas-Kanade tracker, take a look at the JSFeat demo page which runs a point tracker directly inside your browser. You'll have to click on points, one at a time. You'll need Google Chrome or Firefox (just like our VMX project), and this will give you a good sense of what using VMX is going to be like once it is available.

Try the JSFeat Optical Flow Demo!

To summarize, there are lots of great computer vision tools out there, but none of these tools can give you a comprehensive object recognition system which requires little-to-none programming experience. There is a lot of work needed to put together appropriate machine learning algorithms, object detection libraries, web services, trackers, video codecs, etc. Luckily, the team at vision.ai loves both code and machine learning. In addition, having spent the last 10 years of my life working as a research in Computer Vision doesn't hurt.

Getting a PhD in Computer Vision and learning how all of these technologies work is a truly amazing experience. I encourage many students to undertake this 6+ year journey and learn all about computer vision. But I know the PhD path is not for everybody. That's why we've built VMX. So the rest of you can enjoy the power of industrial-grade computer vision algorithms and the ease of intuitive web-based interfaces, without the expertise needed to piece together many different technologies. The number of applications of computer vision tech is astounding and it is a shame that such technology hasn't been delivered with such a lower barrier-to-entry earlier.

With VMX, we're excited that the world is going to experience visual object recognition the way it was meant to be experienced. But for that to happen, we still need your support. Check out our VMX Project on Kickstarter (the page has lots of additional VMX in action videos), and help spread the word.

VMX Project: Computer Vision for Everyone

Monday, January 06, 2014

You asked, we listened. VMX will be available to run locally.

The following post is a result of my team launching a Kickstarter campaign two weeks ago and upgrading one of our rewards based on all the feedback we received from backers and potential backers. We initially intended to launch the VMX project as a service meaning that it would only run over an internet connection to our serves. But there were scenarios where this was not appropriate. Some people didn't have a fast enough internet connection at home, some people were worried that it would be too expensive to use our product, and some people couldn't use software which required an internet connection at work. The VMX Project, our flagship computer vision in-the-browser software, will not run using a local object detection server.

VMX Project: Computer Vision for Everyone

(Cross-posted from VMX Project Kickstarter January 5, 2013 update and post on blog.vision.ai )

Over the last few weeks, we've listened to many backers (and potential backers) talk about our technology and would like to thank everyone who gave us valuable feedback. Many of you didn’t like VMX being offered only as a service (requiring an internet connection), so we decided to offer a local VMX installation in addition to making VMX available as a service. We didn’t anticipate such great demand for VMX running locally on people’s own computers and networks, but we are dedicated to letting developers have an exceptional computer vision experience and are eager to give our users what they want.

Once the early-access period (March 2014 - June 2014) is over, VMX developers will have the option to receive a single-machine VMX license and install VMX on their own computer. With VMX running on your computer, you won’t have to worry about running out of VMX Compute Hours, accidentally making your data public, and most importantly: it won’t require an internet connection. You will also have the option of communicating between VMX running on your computer and our servers. You will be able to download object detectors, download the models you create during the early-access period, as well as back-up your object models and import them into the VMX as-a-service servers.

During our official launch in Summer 2014, a single-machine VMX license will be available to VMX Developers for $100. Kickstarter backers will be able to simply trade-in 100 of their VMX Compute Hours to obtain one single-machine license and download the software for their own use.

The local VMX software will be installable directly on a computer running Linux. For VMX developers running MS Windows or Apple OS X, we will provide a Linux Virtual Image for download which will contain a pre-installed, and fully configured instance of VMX.

We hope this will make all VMX users more excited about our technology.

--the guys from VISION.AI

Friday, January 03, 2014

The "Blank Check" of Entrepreneurship

Two seconds left. You can feel your quads burning. Your form on the ice is rock solid. As you race past the last defender, the only thing standing between you and the win is the goalie who you’ve beaten one-on-one countless times. You’ve been practicing for this moment your entire life.

You accelerate towards victory, when suddenly… BOOM. An unanticipated body check, from who else, but Steve Blank. You are slammed against the wall and your entire game-plan is thrown off-course. S.G. Blank from the opposing team has just issued his signature move, the "Blank Check" -- you know it’s going to hurt tomorrow. Even minutes after the impact, you can still hear Blank taunting you with his infamous saying, “Get out of the building!”

Getting burned on your way to victory is precisely what happens when you try to execute a business plan devised inside your company walls. As Steve Blank would say, “No business plan ever survives first contact with the customer.” When Steve Blank, author of “The Startup Owner's Manual,” the genius behind the Customer Development process, and entrepreneurship professor at Stanford, gives you a message, he means business. You better write this one down.

Today’s post is dedicated to his iconic message, “Get out of the building.” Your initial goal as an startup founder is not to execute a business plan. Thinking you can build it, and then they will come, is thinking inside the building. You have to go into the world, meet potential customers, talk to them, and learn from these interactions. This is precisely what getting out of the building means. You have to search for a business model. If you plan on scoring without pivots, you are likely planning to fail.

According to Steve Blank, a startup is a temporary organization whose goal is to search for a scalable business plan. And the magic can only happen when you go outside your comfort zone, when you talk to people. This is exceptionally difficult for technical founders to grasp. Dear technical masterminds with an itch for entrepreneurship: “getting out of the building” is the single most important piece of advice taken from all of Steve Blank’s writings. Years of training, building things, and being on the forefront of technology have likely given you a skewed perspective on what the world wants. To become a successful entrepreneur, you must first undo the damage of over-education. Once you learn how the world thinks, your technical talent won’t go away, and you’ll be in a great position to lead a great company.

I wrote this blog post for a few reasons. Foremost, as a first-time startup founder, I've been reading endless books on the subject and writing helps me remember what I learned. I guess this blog is now about computer vision and entrepreneurship. Secondly, this post serves as a note-to-self because I've been guilty of engineering products to death and skipping customer development altogether (as you'd expect from a Robotics PhD).

To learn more about Steve Blank and his ideas on entrepreneurship, the Customer Development process, and Lean startups, take at look at the following resources:

Steve Blank

Steve Blank’s Entrepreneurship Blog (has lots of great video links)
http://steveblank.com/

Steve Blank’s Free Udacity Course “How to Build a Startup” (this was my first ever MOOC!)
https://www.udacity.com/course/ep245

Steve Blank’s book “Four Steps to the Epiphany,” which is the most influential book I read in the past 10 years. As groundbreaking as Kuhn's “The Structure of Scientific Revolutions” was to science, Blank's "Epiphany" will likely go down in history as the one that changed the course of entrepreneurship.
http://www.amazon.com/Four-Steps-Epiphany-Steve-Blank/dp/0989200507

P.S. Another great book I just finished reading, Made to Stick, contains a similar point referred to as “The Curse of Knowledge.” The curse of knowledge happens when your message fails to get across because you assumed everybody else is a knowledgeable as you. Or as Steve Blank would say -- you built a feature-rich product flaunting the benefits of advanced features without understanding that the world isn't filled with technical experts. Nobody cares about your features. Not yet. Get out of the building.