Showing posts with label API. Show all posts
Showing posts with label API. Show all posts

Monday, December 23, 2013

VMX: Teach your computer to see without leaving the browser, my Kickstarter project

I’ve spent the last 12 years of my life learning how machines think, and now is time to give a little something back.  I’m not just talking about using computers, nor writing ordinary computer programs.  I’m talking about Robotics, Artificial Intelligence, Machine Learning, and Computer Vision.  Throughout these 12 years, I’ve witnessed how engineers and scientists pursue these problems, at three great universities: RPI, CMU, and MIT.  I’ve been to 11 research conferences, given many talks, wrote and co-wrote many papers, helped teach a few computer vision courses, helped run a few innovation workshops centered around computer vision, and released some open-source computer vision code.

But now, in 2014, most people still struggle with understanding what computer vision is all about and how to get computer vision tools up and running.  I’ve decided that a traditional career in Academia would allow me to motivate no more than a few classrooms of students per year.  A rough estimate of 100 students per year across a 30 year career is a mere 30,000 students.  What about everybody else?  One could argue that some of these students would become educators themselves and the wonderful art of computer vision would reach beyond 30,000.  But I can’t wait.  I don’t want to wait.  Computer vision is too awesome.  I’m too excited.  It's time for everybody to feel this excitement.

So I decided to do something crazy.  Something I wanted to do for a long time, but only recently realized that it would not be possible to do inside the confines of a University.  I recruited the craziest and most bad-ass developer I’ve ever encountered and decided to do the following: convert advanced computer vision technology into a product form that would be so easy to use, a kid without any programming knowledge could train his own object detectors.

I’ve been working non-stop with my colleague and cofounder at our new company, vision.ai, to bring you the following Kickstarter campaign:



What if your computer was just a little bit smarter? What if it could understand what is going on in its surroundings merely by looking at the world through a camera? Such technology could be used to make games more engaging, our interactions with computers more seamless, and allow computers to automate many of our daily chores and responsibilities. We believe that new technology shouldn’t be about advanced knobs, long manuals, or require domain expertise. 

The VMX project was designed to bring cutting-edge computer vision technology to a very broad audience: hobbyists, researchers, artists, students, roboticists, engineers, and entrepreneurs. Not only will we educate you about potential uses of computer vision with our very own open-source vision apps, but the VMX project will give you all the tools you need to bring your own creative computer vision projects to life.

VMX gives individuals all they need to effortlessly build their very own computer vision applications. Our technology is built on top of 10+ years of research experience acquired from CMU, MIT, and Google. By leaving the hard stuff to us, you will be able to focus on creative uses of computer vision without the headaches of mastering machine learning algorithms or managing expensive computations. You won’t need to be a C++ guru or know anything about statistical machine learning algorithms to start using laboratory-grade computer vision tools for your own creative uses.

In order to make the barrier-of-entry to computer vision as low as possible, we built VMX directly in the browser and made sure that it requires no extra hardware. All you need is a laptop with a webcam and a internet connection. Because browsers such as Chrome and Firefox can read video directly from a webcam, you most likely have all of the required software and hardware. The only thing missing is VMX.

We're truly excited about what is going happen next, but we need your help!  Please spread the word, and if you're even mildly excited about computer vision, consider supporting this project.

Thanks Everyone!
Tomasz, @quantombone, author of tombone's computer vision blog

P.S. I'm not telling you what VMX stands for...


Monday, October 28, 2013

Just Add Vision: Turning Computers Into Robots

The future of technology is all about improving the human experience. And the human experience is all about you -- you filling your life with less tedious work, more fun, less discomfort, and more meaningful human interactions. Whether new technology will let us enjoy life more during our spare time (think of what big screen TVs did for entertainment), or, let us become more productive at work (think of what calculators did for engineers), successful technologies have the tendency to improve our quality of life. 

Let’s take a quick look at how things got started... 



IBM started a chain of events by building affordable computers for small businesses to increase their productivity. Microsoft and Apple then created easy-to-use operating systems which allowed the common man to use computers at home for both entertainment (computer games) and being more productive (MS Office). Once personal computers started entering our homes, it was only a matter of a years until broadband internet access become widespread. Google then came along and changed the way we retrieve information from the internet while Social networking redefined how we interact with the people in our lives. Let's not forget modern smartphones, which let us use all of this amazing technology while on the go! 

Surely our iPhones will get faster and smaller while Google search will become more robust, but does the way we interact with these devices have to stay the same? And will these devices always do the same things? 

Computers without keyboards 
A lot of the world’s most exciting technology is designed to be used directly by people and ceases to provide much value once we stop directly interacting with our devices. I honestly believe that instead of wearing more computing devices (such as Google Glass) and learning new iOS commands, what we need is technology that can do useful things on its own, without requiring a person to hit buttons or custom keyboards. Because doing useful things entails having some sort of computational unit inside, it is fair to think of these future devices as “computers.” However, making computers do useful things on their own requires making machines intelligent, something which is yet to reach the masses, so I think a better name for these devices is robots. 

What is a robot? 
If we want machines to help us out in our daily tasks (e.g., cleaning, cooking, driving, playing with us, teaching us) we need machines that can both perceive their immediate environment and act intelligently. The perception-and-action loop is all that is necessary in order to turn everyday computers into intelligent robots. While it would be “nice” to build humanoid robots which look like this: 

In my opinion, a robot is any device capable of executing its own perception and action loop. Thus, it is not necessary to have full-fledged humanoid robots to start reaping the benefit of consumer-robotics in-home robotics. Once we stop looking for smart machines with legs, and broaden our definition of a robot, it is easy to tell that the revolution has already begun. 

Current desktop computers and laptops, which require input in the form of a key being pressed or a movement on the trackpad, can be viewed as semi-intelligent machines -- but because the input interfaces render the perception problem unnecessary, I do not consider them full-fledged robots. However, an iPhone running Siri is capable of sending a text message to one of our contacts via speech, so to some extent I consider Siri-enabled iPhones as robots. Tasks such as cleaning cannot be easily automated using Siri because no matter how dirty a floor is, it will never exclaim, “I’m dirty, please clean me!”. What we need is the ability for our devices to see -- namely, recognize objects in the environment (is this a sofa or a chair?), infer their state (clean vs. dirty), and track their spatial extent in the environment (these pixels belong to the plate). 

Just add vision
We have spent decades using keyboards and mice, essentially learning a machine-specific language between us and machines. Whether you consider keystrokes as a high-level or low-level language is besides the point -- it is still a language, and more specifically a language which requires inputting everything explicitly. If we want machines to effortlessly interact with the world, we need to teach them our language and let them perceive the world directly. With the current advancements in computer vision, this is becoming a reality. But the world needs more visionary thinkers to become computer vision experts, more vision experts to start caring about broader uses of their technology, more everyday programmers to use computer vision in their projects, and more expert-grade computer vision tools accessible to those just starting out. Only then, will we be able to pool our collective efforts and finally interweave in-home robotics with the everyday human experience. 

What's next?
Wouldn’t it be great if we had a general-purpose machine vision API which would render the most tedious and time-consuming part of training object detectors obsolete? Wouldn't it be awesome if we could all use computer vision without becoming mathematics gurus or having years of software engineering experience?  Well, this might be happening sooner than you think.  In an upcoming blog post, I will describe what this API is going to look like and why it’s going to make your life a whole lot easier.  I promise not to disappoint...