Monday, October 28, 2013

Just Add Vision: Turning Computers Into Robots

The future of technology is all about improving the human experience. And the human experience is all about you -- you filling your life with less tedious work, more fun, less discomfort, and more meaningful human interactions. Whether new technology will let us enjoy life more during our spare time (think of what big screen TVs did for entertainment), or, let us become more productive at work (think of what calculators did for engineers), successful technologies have the tendency to improve our quality of life. 

Let’s take a quick look at how things got started... 

IBM started a chain of events by building affordable computers for small businesses to increase their productivity. Microsoft and Apple then created easy-to-use operating systems which allowed the common man to use computers at home for both entertainment (computer games) and being more productive (MS Office). Once personal computers started entering our homes, it was only a matter of a years until broadband internet access become widespread. Google then came along and changed the way we retrieve information from the internet while Social networking redefined how we interact with the people in our lives. Let's not forget modern smartphones, which let us use all of this amazing technology while on the go! 

Surely our iPhones will get faster and smaller while Google search will become more robust, but does the way we interact with these devices have to stay the same? And will these devices always do the same things? 

Computers without keyboards 
A lot of the world’s most exciting technology is designed to be used directly by people and ceases to provide much value once we stop directly interacting with our devices. I honestly believe that instead of wearing more computing devices (such as Google Glass) and learning new iOS commands, what we need is technology that can do useful things on its own, without requiring a person to hit buttons or custom keyboards. Because doing useful things entails having some sort of computational unit inside, it is fair to think of these future devices as “computers.” However, making computers do useful things on their own requires making machines intelligent, something which is yet to reach the masses, so I think a better name for these devices is robots. 

What is a robot? 
If we want machines to help us out in our daily tasks (e.g., cleaning, cooking, driving, playing with us, teaching us) we need machines that can both perceive their immediate environment and act intelligently. The perception-and-action loop is all that is necessary in order to turn everyday computers into intelligent robots. While it would be “nice” to build humanoid robots which look like this: 

In my opinion, a robot is any device capable of executing its own perception and action loop. Thus, it is not necessary to have full-fledged humanoid robots to start reaping the benefit of consumer-robotics in-home robotics. Once we stop looking for smart machines with legs, and broaden our definition of a robot, it is easy to tell that the revolution has already begun. 

Current desktop computers and laptops, which require input in the form of a key being pressed or a movement on the trackpad, can be viewed as semi-intelligent machines -- but because the input interfaces render the perception problem unnecessary, I do not consider them full-fledged robots. However, an iPhone running Siri is capable of sending a text message to one of our contacts via speech, so to some extent I consider Siri-enabled iPhones as robots. Tasks such as cleaning cannot be easily automated using Siri because no matter how dirty a floor is, it will never exclaim, “I’m dirty, please clean me!”. What we need is the ability for our devices to see -- namely, recognize objects in the environment (is this a sofa or a chair?), infer their state (clean vs. dirty), and track their spatial extent in the environment (these pixels belong to the plate). 

Just add vision
We have spent decades using keyboards and mice, essentially learning a machine-specific language between us and machines. Whether you consider keystrokes as a high-level or low-level language is besides the point -- it is still a language, and more specifically a language which requires inputting everything explicitly. If we want machines to effortlessly interact with the world, we need to teach them our language and let them perceive the world directly. With the current advancements in computer vision, this is becoming a reality. But the world needs more visionary thinkers to become computer vision experts, more vision experts to start caring about broader uses of their technology, more everyday programmers to use computer vision in their projects, and more expert-grade computer vision tools accessible to those just starting out. Only then, will we be able to pool our collective efforts and finally interweave in-home robotics with the everyday human experience. 

What's next?
Wouldn’t it be great if we had a general-purpose machine vision API which would render the most tedious and time-consuming part of training object detectors obsolete? Wouldn't it be awesome if we could all use computer vision without becoming mathematics gurus or having years of software engineering experience?  Well, this might be happening sooner than you think.  In an upcoming blog post, I will describe what this API is going to look like and why it’s going to make your life a whole lot easier.  I promise not to disappoint...


  1. Are you promising to publish a vision API in the tradition of vl-feat or sklearn-- something higher-level than OpenCV that will abstract away all the tedious work of training detectors/ classifiers? Or will your future post be a visionary essay about what we in the vision community should be striving to create?

  2. Hi Genevieve,
    I have been working with my team on something a bit higher-level than VLfeat and OpenCV. The problem with using these software packages is that you still have to gather your own training data, label this data, know how to select the right classifier, train on a cluster, etc... We are after something a bit simpler and more automatic and have been working on a 'product' that will make this a reality. I haven't made the announcement official, but I'm no longer a postdoc at MIT and have been working full time with my startup team on this new and exciting frontier.

    I will be less and less vague in the upcoming weeks, and eventually post videos/code that show how to use our API. Hopefully you and your friends will get to test-run our cool new tech very soon. :-)

  3. Thanks Tomasz~
    Stay tuned.

  4. A store door that opens automatically when I walk toward it can be considered to be a robotic. It has the two elements of perception and actuation. It has a rudimentary form of the decision making, the third element. Many systems lack actuation other than communication. The most complete home system that I know of is iRobot which sweeps our floors. Google's self-driving cars are complete systems. When will we have robots shovel our driveways?

  5. I think actuation will ultimately be very important, but we first need to see more primitive "perception-then-communication" systems. I agree that iRobot has some nice complete systems and Google's self-driving car is probably the best example of such systems today.

    I don't see a general-purpose shoveling robot (which can handle the variations in people's driveways) coming anytime soon. I think what is necessary is for people to teach their shoveling robots about their own driveways, perhaps take the robot shoveling with you a couple of times first. This means that we need a mechanism for showing these robots what to do and letting the robots adjust to our environments. Such a mechanism cannot be something which is done in the R&D lab, the way robots are currently programmed. We will need a sort of teaching-by-demonstration interface which will allow the robots to take be taught by people without resorting to programming.

    I think vision is going to be interface-of-the-future between us, people, and robots. I envision a future human-robot interface where we can say, "Look over here, robot, this is object X" and then follow up with something like, "I want you to place X over here whenever you see it at location L."

    There are currently too many high-level AI problems that need to be solved to make general purpose shoveling robots. But I think teaching your robot to shovel YOUR driveway is quite possible.

  6. thanks for the cute robot :) and also the information u have .
    nice job .

  7. Wow, your product sounds like it's going to be incredible! Looking forward to it!

  8. Thanks for sharing nice job.

  9. This is amazingly great.

  10. The company I founded based on the idea presented in this blog post finally launched a Kickstarter! You can see a preview of the API I was alluding to in our video! Our campaign is running until the end of January, but we need guys like you to share the word and contribute if you can. It's really an exciting time for us, and we can't wait to get this technology in your hands!

    If you have enjoyed any of my research papers and/or blog posts, consider making a contribution to our Kickstarter campaign!

  11. specifically a language which requires inputting everything explicitly.

  12. necessary to have full-fledged humanoid robots to start reaping the benefit of consumer-robotics in-home robotics.

  13. Pretty excited after reading this post... Looking forward eagerly to witness such era..:)