Tombone's Computer Vision Blog: March 2015

Thursday, March 26, 2015

Venture Pitch Contest at CVPR 2015 in Boston, MA

This year's CVPR will be in Boston, and as always, I expect it to be the single best venue to meet computer vision experts and see cutting edge research. I expect Google and Facebook to show off their best Deep Learning systems, NVIDIA to demo their newest GPUs, and dozens of computer vision startups to be looking for talent to grow their teams.

I expect the entrepreneur/academic ratio to be much higher, as it is getting easier for PhD students and postdocs to start their own companies. This year's CVPR will even feature a Venture Pitch Contest as part of the Fourth Annual Vision Industry and Entrepreneur (VIEW) Workshop at CVPR. From the VIEW workshop webpage:

Computer vision as a technology is penetrating the industry at an extraordinary pace with many computer vision applications directly becoming consumer commodities. Both startups and big companies have contributed to this trend. At the fourth annual Vision Industry and Entrepreneur Workshop, we are organizing a first of its kind Startup Pitch Contest. As a computer vision innovator, this is your chance to present the next great computer vision product idea to a distinguished panel of judges which will include Venture Capitalists, Investors and leading Researchers in the field.

Applications should employ novel computer vision technologies towards an innovative product. The best submissions would be selected for an Elevator Pitch presentation in front of the judges. Prizes would be awarded to the winners who would be announced at the end of the workshop. The details about the judging criteria will be posted on the website.

The submission is broken into two phases – Preliminary submission consisting of a title and an abstract, and, Final submission consisting of a one page summary with technology overview, feasibility, outreach (customers and market size) and monetization (business model). The summary should be tailored at soliciting funding from sources such as venture capital to invest in the idea. The applicants should indicate whether they are academic researchers or industry professionals. Only non-confidential material may be submitted.

Even if you're not ready to pitch, you can submit a poster or demo to the Industry Session part of the VIEW 2015 Workshop. Great place to show off your new computer vision-powered app. One of the organizers, Samson Timoner, told me the deadlines for submission have been extended. Here are the new dates:

Submission: April 3, 2015 (extended)
Notification: April 8, 2015 (extended)
Workshop: June 11, 2015

This year's CVPR is going to be a great place to network with startups, share ideas, see cutting-edge research and (NEW in 2015) meet folks from the venture capital world. Who knows, if I'm there, I might be wearing a vision.ai T-shirt.

Mobileye's quest to put Deep Learning inside every new car

In Amnon Shashua's vision of the future, every car can see. He's convinced that the key technology behind the imminent driving revolution is going to be computer vision, and to experience this technology, we won't have to wait for fully autonomous cars to become mainstream. I had the chance to hear Shashua's vision of the future this past Monday, and from what I'm about to tell you, it looks like there's going to be a whole lot of Deep Learning inside tomorrow's car. Cars equipped with Deep Learning-based pedestrian avoidance systems (See Figure 1) can sense people and dangerous situations while you're behind the wheel. From winning large-scale object recognition competitions like ImageNet, to heavy internal use by Google, Deep Learning is now at the foundation of many hi-tech startups and giants. And when it comes to cars, Deep Learning promises to give us both safer roads and the highly-anticipated hands-free driving experience.

Mobileye's Deep Learning-based Pedestrian Detector

Mobileye Co-founder Amnon Shashua shares his vision during an invited lecture at MIT

Amnon Shashua is the Co-founder & CTO of Mobileye and this past Monday (March 23, 2015) he gave a compelling talk at MIT’s Brains, Minds & Machines Seminar Series titled “Computer Vision that is Changing Our Lives”. Shashua discussed Mobileye’s Deep Learning chips, robots, autonomous driving, as well as introduced his most recent project, a wearable computer vision unit called OrCam.

Fig 2. Prof Amnon Shashua, CTO of Mobileye

Let's take a deeper look at the man behind Mobileye and his vision. Below is my summary of Shashua's talk as well as some personal insights regarding Mobileye's embedded computer vision technology and how it relates to cloud-based computer vision.

Mobileye's academic roots
You might have heard stories of bold entrepreneurs dropping out of college to form million dollar startups, but this isn't one of them. This is the story of a professor who turned his ideas into a publicly traded company, Mobileye (NYSE:MBLY). Amnon Shashua is a Professor at Hebrew University, and his lifetime achievements suggest that for high-tech entrepreneurship, it is pretty cool to stay in school. And while Shashua and I never overlapped academically (he is 23 years older than me), both of us spent some time at MIT as postdoctoral researchers.

Deep Learning's impact on Mobileye

During his presentation at MIT, Amnon Shashua showcased a wide array of of computer vision problems that are currently being solved by Mobileye real-time computer vision systems. These systems are image-based and do not require expensive 3D sensors such as the ones commonly found on top of self-driving cars. He showed videos of real-time lane detection, pedestrian detection, animal detection, and road surface detection. I have seen many similar visualizations during my academic career; however, Shashua emphasized that deep learning is now used to power most of Mobileye's computer vision systems.

Question: I genuinely wonder how much the shift to Deep methods improved Mobileye's algorithms, or if the move is a strategic technology upgrade to stay relevant in the era where Google and and competition is feverishly pouncing on the landscape of deep learning. There's a lot of competition on the hardware front, and it seems like the chase for ASIC-like Deep Learning Miners/Trainers is on.

The AlexNet CNN diagram from the popular Krizhevsky/Sutskever/Hinton paper. Shashua explicitly mentioned the AlexNet model during his MIT talk, and it appears that Mobileye has done their Deep Learning homework.

The early Mobileye: Mobileye didn’t wait for the deep learning revolution to happen. They started shipping computer vision technology for vehicles using traditional techniques more than a decade ago. In fact, I attended a Mobileye presentation at CMU almost a full decade ago -- it was given by Andras Ferencz at the 2005 CMU VASC Seminar. This week's talk by Shashua suggests that Mobileye was able to successfully modernize their algorithms to use deep learning.

Further reading: To learn about object recognition methods in computer vision which were popular before Deep Learning, see my January blog post, titled From feature descriptors to deep learning: 20 years of computer vision.

Fig 3. "Deep Learning at Mobileye" presentation at the 2015 Deutsche Bank Global

Auto Industry Conference.

Mobileye's custom Computer Vision hardware

Mobileye is not a software computer vision company -- they bake their algorithms into custom computer vision chips. Shashua reported some impressive computation speeds on what appears to be tiny vision chips. Their custom hardware is more specific than GPUs (which are quite common for deep learning, scientific computations, computer graphics, and actually affordable). But Mobileye chips do not need to perform the computationally expensive big-data training stage onboard, so their devices can be much leaner than GPUs. Mobileye has lots of hardware experience, and regarding machine learning, Shashua mentioned that Mobileye has more vehicle-related training data than they know what to do with.

Fig 4. The Mobileye Q2 lane detection chip.

Embedded vs. Cloud-based computer vision
While Mobileye makes a strong case for embedded computer vision, there are many scenarios today where the alternative cloud-based computer vision approach triumphs. Cloud-based computer vision is about delivering powerful algorithms as a service, over the web. In a cloud-based architecture, the algorithms live in a data center and applications talk to the vision backend via an API layer. And while certain mission-critical applications cannot have a cloud-component (e.g., a drones flying over the desert), cloud-based vision system promise to turn laptops and smartphones into smart devices, without the need to bake algorithms into chips. In-home surveillance apps, home-automation apps, exploratory robotics projects, and even scientific research can benefit from cloud-based computer vision. Most importantly, cloud-based deployment means that startups can innovate faster, and entire products can evolve much faster.

Unlike Mobileye's decade-long journey, I suspect cloud-based computer vision platforms are going to make computer vision development much faster, giving developers a Heroku-like button for visual AI. Choosing diverse compilation targets such as a custom chip or Javascript will be handled by the computer vision platform, allowing computer vision developers to work smarter and deploy to more devices.

Conclusion and Predictions

Even if you don't believe that today's computer vision-based safety features make cars smart enough to call them robots, driving tomorrow's car is sure going to feel different. I will leave you with one final note: Mobileye's CTO hinted that if you are going to design a car in 2015 on top of computer vision tech, you might reconsider traditional safety features such as airbags, and create a leaner, less-expensive AI-enabled vehicle.

Fig 5. Mobileye technology illustration [safety.trw.com].

Watch the Mobileye presentation on YouTube: If you are interested in embedded deep learning, autonomous vehicles, or want to get a taste of how the industry veterans compile their deep networks into chips, you can watch the full 38-minute presentation from Amnon's January 2015 Mobileye presentation.

I hope you learned a little bit about vehicle computer vision systems, embedded Deep Learning, and got a glimpse of the visual intelligence revolution that is happening today. Feel free to comment below, follow me on Twitter (@quantombone), or sign-up to the vision.ai mailing list if you are a developer interested in taking vision.ai's cloud-based computer vision platform for a spin.

Follow @quantombone

Friday, March 20, 2015

Deep Learning vs Machine Learning vs Pattern Recognition

Lets take a close look at three related terms (Deep Learning vs Machine Learning vs Pattern Recognition), and see how they relate to some of the hottest tech-themes in 2015 (namely Robotics and Artificial Intelligence). In our short journey through jargon, you should acquire a better understanding of how computer vision fits in, as well as gain an intuitive feel for how the machine learning zeitgeist has slowly evolved over time.

Fig 1. Putting a human inside a computer is not Artificial Intelligence

(Photo from WorkFusion Blog)

If you look around, you'll see no shortage of jobs at high-tech startups looking for machine learning experts. While only a fraction of them are looking for Deep Learning experts, I bet most of these startups can benefit from even the most elementary kind of data scientist. So how do you spot a future data-scientist? You learn how they think.

The three highly-related "learning" buzz words

“Pattern recognition,” “machine learning,” and “deep learning” represent three different schools of thought. Pattern recognition is the oldest (and as a term is quite outdated). Machine Learning is the most fundamental (one of the hottest areas for startups and research labs as of today, early 2015). And Deep Learning is the new, the big, the bleeding-edge -- we’re not even close to thinking about the post-deep-learning era. Just take a look at the following Google Trends graph. You'll see that a) Machine Learning is rising like a true champion, b) Pattern Recognition started as synonymous with Machine Learning, c) Pattern Recognition is dying, and d) Deep Learning is new and rising fast.

1. Pattern Recognition: The birth of smart programs

Pattern recognition was a term popular in the 70s and 80s. The emphasis was on getting a computer program to do something “smart” like recognize the character "3". And it really took a lot of cleverness and intuition to build such a program. Just think of "3" vs "B" and "3" vs "8". Back in the day, it didn’t really matter how you did it as long as there was no human-in-a-box pretending to be a machine. (See Figure 1) So if your algorithm would apply some filters to an image, localize some edges, and apply morphological operators, it was definitely of interest to the pattern recognition community. Optical Character Recognition grew out of this community and it is fair to call “Pattern Recognition” as the “Smart" Signal Processing of the 70s, 80s, and early 90s. Decision trees, heuristics, quadratic discriminant analysis, etc all came out of this era. Pattern Recognition become something CS folks did, and not EE folks. One of the most popular books from that time period is the ~~infamous~~ invaluable Duda & Hart "Pattern Classification" book and is still a great starting point for young researchers. But don't get too caught up in the vocabulary, it's a bit dated.

The character "3" partitioned into 16 sub-matrices. Custom rules, custom decisions, and custom "smart" programs used to be all the rage.

Quiz: The most popular Computer Vision conference is called CVPR and the PR stands for Pattern Recognition. Can you guess the year of the first CVPR conference?

2. Machine Learning: Smart programs can learn from examples

Sometime in the early 90s people started realizing that a more powerful way to build pattern recognition algorithms is to replace an expert (who probably knows way too much about pixels) with data (which can be mined from cheap laborers). So you collect a bunch of face images and non-face images, choose an algorithm, and wait for the computations to finish. This is the spirit of machine learning. "Machine Learning" emphasizes that the computer program (or machine) must do some work after it is given data. The Learning step is made explicit. And believe me, waiting 1 day for your computations to finish scales better than inviting your academic colleagues to your home institution to design some classification rules by hand.

"What is Machine Learning" from Dr Natalia Konstantinova's Blog. The most important part of this diagram are the "Gears" which suggests that crunching/working/computing is an important step in the ML pipeline.

As Machine Learning grew into a major research topic in the mid 2000s, computer scientists began applying these ideas to a wide array of problems. No longer was it only character recognition, cat vs. dog recognition, and other “recognize a pattern inside an array of pixels” problems. Researchers started applying Machine Learning to Robotics (reinforcement learning, manipulation, motion planning, grasping), to genome data, as well as to predict financial markets. Machine Learning was married with Graph Theory under the brand “Graphical Models,” every robotics expert had no choice but to become a Machine Learning Expert, and Machine Learning quickly became one of the most desired and versatile computing skills. However "Machine Learning" says nothing about the underlying algorithm. We've seen convex optimization, Kernel-based methods, Support Vector Machines, as well as Boosting have their winning days. Together with some custom manually engineered features, we had lots of recipes, lots of different schools of thought, and it wasn't entirely clear how a newcomer should select features and algorithms. But that was all about to change...

Further reading: To learn more about the kinds of features that were used in Computer Vision research see my blog post: From feature descriptors to deep learning: 20 years of computer vision.

3. Deep Learning: one architecture to rule them all

Fast forward to today and what we’re seeing is a large interest in something called Deep Learning. The most popular kinds of Deep Learning models, as they are using in large scale image recognition tasks, are known as Convolutional Neural Nets, or simply ConvNets.

ConvNet diagram from Torch Tutorial

Deep Learning emphasizes the kind of model you might want to use (e.g., a deep convolutional multi-layer neural network) and that you can use data fill in the missing parameters. But with deep-learning comes great responsibility. Because you are starting with a model of the world which has a high dimensionality, you really need a lot of data (big data) and a lot of crunching power (GPUs). Convolutions are used extensively in deep learning (especially computer vision applications), and the architectures are far from shallow.

If you're starting out with Deep Learning, simply brush up on some elementary Linear Algebra and start coding. I highly recommend Andrej Karpathy's Hacker's guide to Neural Networks. Implementing your own CPU-based backpropagation algorithm on a non-convolution based problem is a good place to start.

There are still lots of unknowns. The theory of why deep learning works is incomplete, and no single guide or book is better than true machine learning experience. There are lots of reasons why Deep Learning is gaining popularity, but Deep Learning is not going to take over the world. As long as you continue brushing up on your machine learning skills, your job is safe. But don't be afraid to chop these networks in half, slice 'n dice at will, and build software architectures that work in tandem with your learning algorithm. The Linux Kernel of tomorrow might run on Caffe (one of the most popular deep learning frameworks), but great products will always need great vision, domain expertise, market development, and most importantly: human creativity.

Other related buzz-words

Big-data is the philosophy of measuring all sorts of things, saving that data, and looking through it for information. For business, this big-data approach can give you actionable insights. In the context of learning algorithms, we’ve only started seeing the marriage of big-data and machine learning within the past few years. Cloud-computing, GPUs, DevOps, and PaaS providers have made large scale computing within reach of the researcher and ambitious "everyday" developer.

Artificial Intelligence is perhaps the oldest term, the most vague, and the one that was gone through the most ups and downs in the past 50 years. When somebody says they work on Artificial Intelligence, you are either going to want to laugh at them or take out a piece of paper and write down everything they say.

Further reading: My 2011 Blog post Computer Vision is Artificial Intelligence.

Conclusion

Machine Learning is here to stay. Don't think about it as Pattern Recognition vs Machine Learning vs Deep Learning, just realize that each term emphasizes something a little bit different. But the search continues. Go ahead and explore. Break something. We will continue building smarter software and our algorithms will continue to learn, but we've only begun to explore the kinds of architectures that can truly rule-them-all.

If you're interested in real-time vision applications of deep learning, namely those suitable for robotic and home automation applications, then you should check out what we've been building at vision.ai. Hopefully in a few days, I'll be able to say a little bit more. :-)

Until next time.

See discussion about this blog post on Hacker News.

Thursday, March 26, 2015

Venture Pitch Contest at CVPR 2015 in Boston, MA

Mobileye's quest to put Deep Learning inside every new car

Friday, March 20, 2015

Deep Learning vs Machine Learning vs Pattern Recognition

Subscribe To