The VMX project utilizes many different programming languages and technologies. Many of the behind-the-scenes machine learning algorithms have been developed in our lab, but to make a good product it takes more than just robust backed algorithms. On the front-end, the two key open source (MIT licensed) projects we rely on are AngularJS and JSFeat. AngularJS is an open-source JavaScript framework, maintained by Google, that assists with running single-page applications. Today's focus will be on JSFeat, the Javascript Computer Vision Library we use inside the front-end webapp. What is JSFeat? Quoting Eugene Zatepyakin, the author of JSFeat, "The project aim is to explore JS/HTML5 possibilities using modern & state-of-art computer vision algorithms."
We use the JSFeat library to track points inside the video stream. Below is a YouTube video of our webapp in action, where we enabled the "debug display" to show you what is happening to tracked points behind the scenes. The blue points are being tracked inside the browser, the green box is the output of our object detection service (already trained on my face), and the black box is the interpolated result which integrates the backend service and the frontend tracker.
Instead of using interest points, in our prototype video we used a regularly spaced grid of points covering the entire video stream. This grid gets re-initialized every N seconds. It avoids the extra expense of finding interest points inside every frame. NOTE: inside our vision.ai computer vision lab, we are incessantly experimenting with better ways of integrating point tracks with strong object detector results. What you're seeing is just an early snapshot of the technology in action.
To play with a Lucas-Kanade tracker, take a look at the JSFeat demo page which runs a point tracker directly inside your browser. You'll have to click on points, one at a time. You'll need Google Chrome or Firefox (just like our VMX project), and this will give you a good sense of what using VMX is going to be like once it is available.
Try the JSFeat Optical Flow Demo!
To summarize, there are lots of great computer vision tools out there, but none of these tools can give you a comprehensive object recognition system which requires little-to-none programming experience. There is a lot of work needed to put together appropriate machine learning algorithms, object detection libraries, web services, trackers, video codecs, etc. Luckily, the team at vision.ai loves both code and machine learning. In addition, having spent the last 10 years of my life working as a research in Computer Vision doesn't hurt.
Getting a PhD in Computer Vision and learning how all of these technologies work is a truly amazing experience. I encourage many students to undertake this 6+ year journey and learn all about computer vision. But I know the PhD path is not for everybody. That's why we've built VMX. So the rest of you can enjoy the power of industrial-grade computer vision algorithms and the ease of intuitive web-based interfaces, without the expertise needed to piece together many different technologies. The number of applications of computer vision tech is astounding and it is a shame that such technology hasn't been delivered with such a lower barrier-to-entry earlier.
With VMX, we're excited that the world is going to experience visual object recognition the way it was meant to be experienced. But for that to happen, we still need your support. Check out our VMX Project on Kickstarter (the page has lots of additional VMX in action videos), and help spread the word.
I would like to show it an outside photo and ask where it was taken. I can listen to music and identify the CD [much easier, of course].
ReplyDeleteHi Harlow,
ReplyDeleteThat is not an easy task, but here is a link to some work I did a few years back with colleagues at CMU on this image matching task:
http://graphics.cs.cmu.edu/projects/crossDomainMatching/