Friday, March 20, 2015

Deep Learning vs Machine Learning vs Pattern Recognition

Lets take a close look at three related terms (Deep Learning vs Machine Learning vs Pattern Recognition), and see how they relate to some of the hottest tech-themes in 2015 (namely Robotics and Artificial Intelligence). In our short journey through jargon, you should acquire a better understanding of how computer vision fits in, as well as gain an intuitive feel for how the machine learning zeitgeist has slowly evolved over time.

Fig 1. Putting a human inside a computer is not Artificial Intelligence
(Photo from WorkFusion Blog)

If you look around, you'll see no shortage of jobs at high-tech startups looking for machine learning experts. While only a fraction of them are looking for Deep Learning experts, I bet most of these startups can benefit from even the most elementary kind of data scientist. So how do you spot a future data-scientist? You learn how they think. 

The three highly-related "learning" buzz words

“Pattern recognition,” “machine learning,” and “deep learning” represent three different schools of thought.  Pattern recognition is the oldest (and as a term is quite outdated). Machine Learning is the most fundamental (one of the hottest areas for startups and research labs as of today, early 2015). And Deep Learning is the new, the big, the bleeding-edge -- we’re not even close to thinking about the post-deep-learning era.  Just take a look at the following Google Trends graph.  You'll see that a) Machine Learning is rising like a true champion, b) Pattern Recognition started as synonymous with Machine Learning, c) Pattern Recognition is dying, and d) Deep Learning is new and rising fast.



1. Pattern Recognition: The birth of smart programs

Pattern recognition was a term popular in the 70s and 80s. The emphasis was on getting a computer program to do something “smart” like recognize the character "3". And it really took a lot of cleverness and intuition to build such a program. Just think of "3" vs "B" and "3" vs "8".  Back in the day, it didn’t really matter how you did it as long as there was no human-in-a-box pretending to be a machine. (See Figure 1)  So if your algorithm would apply some filters to an image, localize some edges, and apply morphological operators, it was definitely of interest to the pattern recognition community.  Optical Character Recognition grew out of this community and it is fair to call “Pattern Recognition” as the “Smart" Signal Processing of the 70s, 80s, and early 90s. Decision trees, heuristics, quadratic discriminant analysis, etc all came out of this era. Pattern Recognition become something CS folks did, and not EE folks.  One of the most popular books from that time period is the infamous invaluable Duda & Hart "Pattern Classification" book and is still a great starting point for young researchers.  But don't get too caught up in the vocabulary, it's a bit dated.



The character "3" partitioned into 16 sub-matrices. Custom rules, custom decisions, and custom "smart" programs used to be all the rage. 


QuizThe most popular Computer Vision conference is called CVPR and the PR stands for Pattern Recognition.  Can you guess the year of the first CVPR conference?

2. Machine Learning: Smart programs can learn from examples

Sometime in the early 90s people started realizing that a more powerful way to build pattern recognition algorithms is to replace an expert (who probably knows way too much about pixels) with data (which can be mined from cheap laborers).  So you collect a bunch of face images and non-face images, choose an algorithm, and wait for the computations to finish.  This is the spirit of machine learning.  "Machine Learning" emphasizes that the computer program (or machine) must do some work after it is given data.  The Learning step is made explicit.  And believe me, waiting 1 day for your computations to finish scales better than inviting your academic colleagues to your home institution to design some classification rules by hand.


"What is Machine Learning" from Dr Natalia Konstantinova's Blog. The most important part of this diagram are the "Gears" which suggests that crunching/working/computing is an important step in the ML pipeline.

As Machine Learning grew into a major research topic in the mid 2000s, computer scientists began applying these ideas to a wide array of problems.  No longer was it only character recognition, cat vs. dog recognition, and other “recognize a pattern inside an array of pixels” problems.  Researchers started applying Machine Learning to Robotics (reinforcement learning, manipulation, motion planning, grasping), to genome data, as well as to predict financial markets.  Machine Learning was married with Graph Theory under the brand “Graphical Models,” every robotics expert had no choice but to become a Machine Learning Expert, and Machine Learning quickly became one of the most desired and versatile computing skills.  However "Machine Learning" says nothing about the underlying algorithm.  We've seen convex optimization, Kernel-based methods, Support Vector Machines, as well as Boosting have their winning days.  Together with some custom manually engineered features, we had lots of recipes, lots of different schools of thought, and it wasn't entirely clear how a newcomer should select features and algorithms.  But that was all about to change...

Further reading: To learn more about the kinds of features that were used in Computer Vision research see my blog post: From feature descriptors to deep learning: 20 years of computer vision.

3. Deep Learning: one architecture to rule them all

Fast forward to today and what we’re seeing is a large interest in something called Deep Learning. The most popular kinds of Deep Learning models, as they are using in large scale image recognition tasks, are known as Convolutional Neural Nets, or simply ConvNets. 


ConvNet diagram from Torch Tutorial

Deep Learning emphasizes the kind of model you might want to use (e.g., a deep convolutional multi-layer neural network) and that you can use data fill in the missing parameters.  But with deep-learning comes great responsibility.  Because you are starting with a model of the world which has a high dimensionality, you really need a lot of data (big data) and a lot of crunching power (GPUs). Convolutions are used extensively in deep learning (especially computer vision applications), and the architectures are far from shallow.

If you're starting out with Deep Learning, simply brush up on some elementary Linear Algebra and start coding.  I highly recommend Andrej Karpathy's Hacker's guide to Neural Networks. Implementing your own CPU-based backpropagation algorithm on a non-convolution based problem is a good place to start.

There are still lots of unknowns. The theory of why deep learning works is incomplete, and no single guide or book is better than true machine learning experience.  There are lots of reasons why Deep Learning is gaining popularity, but Deep Learning is not going to take over the world.  As long as you continue brushing up on your machine learning skills, your job is safe. But don't be afraid to chop these networks in half, slice 'n dice at will, and build software architectures that work in tandem with your learning algorithm.  The Linux Kernel of tomorrow might run on Caffe (one of the most popular deep learning frameworks), but great products will always need great vision, domain expertise, market development, and most importantly: human creativity.

Other related buzz-words

Big-data is the philosophy of measuring all sorts of things, saving that data, and looking through it for information.  For business, this big-data approach can give you actionable insights.  In the context of learning algorithms, we’ve only started seeing the marriage of big-data and machine learning within the past few years.  Cloud-computing, GPUs, DevOps, and PaaS providers have made large scale computing within reach of the researcher and ambitious "everyday" developer. 

Artificial Intelligence is perhaps the oldest term, the most vague, and the one that was gone through the most ups and downs in the past 50 years. When somebody says they work on Artificial Intelligence, you are either going to want to laugh at them or take out a piece of paper and write down everything they say.

Further reading: My 2011 Blog post Computer Vision is Artificial Intelligence.

Conclusion

Machine Learning is here to stay. Don't think about it as Pattern Recognition vs Machine Learning vs Deep Learning, just realize that each term emphasizes something a little bit different.  But the search continues.  Go ahead and explore. Break something. We will continue building smarter software and our algorithms will continue to learn, but we've only begun to explore the kinds of architectures that can truly rule-them-all.

If you're interested in real-time vision applications of deep learning, namely those suitable for robotic and home automation applications, then you should check out what we've been building at vision.ai. Hopefully in a few days, I'll be able to say a little bit more. :-)

Until next time.




23 comments:

  1. The "Pattern Classification" book which I mentioned in this blog post is the orange/red book which looks like this: http://www.wiley.com/WileyCDA/WileyTitle/productCd-0471056693.html
    It's really a great book (even though it is a bit dated).

    A more recent machine learning textbook is Christopher Bishop's "Pattern Classification and Machine Learning" and is still used in ML classrooms today.

    http://research.microsoft.com/en-us/um/people/cmbishop/prml/

    ReplyDelete
  2. Thank you for great article!

    ReplyDelete
  3. There will be some posts describing what we've been up to with vision.ai, but here are 10 Beta keys for the vision.ai software. Please only use one, and be warned that they are first come first served. I will post an update when they are all gone:

    6db56b8c-b775-4516-9843-f186ab3cf938
    182cddf3-78f7-4aeb-abeb-e47166142f34
    4f2b41f5-26d5-4175-8920-c499c9c62e28
    72c2a7f3-2e1f-4761-8f9b-424c86926e6b
    1393dcca-057a-4f32-b840-daa4dd1f51a7
    6bdcd723-3fb2-4c6b-816f-72f13a4c0b4d
    f40125e9-85fe-4b5b-8de0-ba9ceeacafa8
    f61d71b8-676b-4650-8fe5-efd320b39a2b
    30d58085-e25c-4b30-abf6-3871ae0bde8d
    cfd23407-0ec1-483c-bae5-b3653838b5f0

    ReplyDelete
    Replies
    1. Where to use them?

      Delete
    2. You can use the license keys (each line is a separate key) to activate our software which is in public beta and download links are found on our website:

      https://beta.vision.ai/purchase

      We have a Docker based installer for Linux, a native Mac OS X installer, and activation happens in the GUI after installation.

      Only one license code left:
      1393dcca-057a-4f32-b840-daa4dd1f51a7

      Delete
    3. Here are ten more activation licenses:

      2a11f0e9-f2c8-4bd6-9c72-9581c7ce6f3c
      7af7f61d-9c1c-4716-b642-863a4781a113
      17efd11b-7d7b-4e78-9d2d-7660d1e270fc
      055065a7-78d4-436f-a0be-59011fcd1299
      46efc1e9-7ad0-4f40-af6f-fc2aa801062a
      a01ac9e2-cfa8-4f55-ae12-b6a27c9be128
      c763b38c-22ed-4d56-8b50-6ca03452cce7
      4d60120b-c5de-4e27-b77f-577b53c46a5e
      e557188b-8985-4a84-b830-c1d5e681947f
      3349f646-54ad-4825-ade7-66e00a69b122

      Delete
    4. As of Monday 3/23/2015 8:00pm ET, there are only 5 valid activation codes left:

      1393dcca-057a-4f32-b840-daa4dd1f51a7
      2a11f0e9-f2c8-4bd6-9c72-9581c7ce6f3c
      7af7f61d-9c1c-4716-b642-863a4781a113
      055065a7-78d4-436f-a0be-59011fcd1299
      46efc1e9-7ad0-4f40-af6f-fc2aa801062a

      Delete
    5. As of Wednesday 3/25/2015, only three activation codes left:

      2a11f0e9-f2c8-4bd6-9c72-9581c7ce6f3c
      7af7f61d-9c1c-4716-b642-863a4781a113
      46efc1e9-7ad0-4f40-af6f-fc2aa801062a

      Thanks guys, if you've activated with your email, we'll let you know about the launch and any other big announcements.

      Delete
  4. Thanks for eating up some of these licenses, folks. They work for the current version and will work for all future updates of our VMX software. The tutorials will be out soon, but if you're eager then check it out. 5 licenses left.

    If you sign up to the vision.ai mailing list, we will keep you updated with new of our imminent "big launch".

    You can train detectors using our GUI and get a taste of what next generation Machine Learning is all about. But you don't actually need to know anything about Machine Learning to start training your own object recognition models.

    ReplyDelete
  5. Anonymous12:13 AM

    tried. Great tools! but poor ui:( much things have to be done. somehow like machine learning module on Azure.

    ReplyDelete
    Replies
    1. Thanks for the feedback. Will have to check out the Azure ML module. We are focusing on object detection: definitely need to clean up some of the tracker noise, add instructions to the GUI, and go through more user tests to make it more intuitive. As the tool grew out of one man's journey (BS/MS/PhD/Postdoc/Bootstrapper) through computer vision, it did have the unfortunate start as an expert tool made by one expert for his friends. :-)

      Delete
    2. What's the old adage? If you you aren't embarrassed by your 1.0 release you waited too long to release it? Love the bootstrap mentality.

      Delete
    3. I agree with you Adrian. But it's not easy being a perfectionist and knowing two conflicting things: 1) that your early version is going to be rough around the edges and 2) the only way to make it good is to have people use it.

      We're at version v0.2, and can't wait to raise a little so we can get a UX designer in-house!

      Delete
  6. OK, now I'm wondering: how is deep learning different (or distinct?) from just using artificial neural networks (ANNs; which have been used as models in machine learning since the inception of this field)?

    Say, compared to the MLP (multilayer perceptron ANN -- as traditional as it gets, I suppose) -- what's new in deep learning?

    Is it the kind of neural network (ConvNets instead of MLPs?) -- is it the number or size of layers (but that's not really new, then, since large, multilayer MLPs aren't nothing new, either)?

    Or, is it the way they're being trained (hardware choices, like GPUs -- as well as math algorithms choices, like stochastic gradient descent; or avoiding the numerical difficulties, like the vanishing gradient problem, bringing us to LSTM, pretraining? -- if so, what exactly about any of these developments "identifies" whether one is using deep learning -- and how?).

    It's not easy to disambiguate between ANNs and deep learning for me at the moment -- for instance, everything in the Andrej Karpathy's guide is just good, old, traditional machine learning applying classic ANNs.

    Any more details would be greatly appreciated! :-)

    ReplyDelete
  7. Hi MD,

    You're rightfully confused. The boundary between multilayer perceptron ANNs and deep learning systems is blurry, and because working on Deep Learning is so fashionable as of late, lots of systems are claiming to be deep.

    Deep learning vs old ANNs is based on 3 things:

    1.) the amount of data involved. ImageNet is orders of magnitude larger than the kind of MNIST-style datasets that were popular in the 80s and 90s.

    2.) computers are much faster. It's too late for me to do the back-of-the-envelope calculation, but I wouldn't be surprised if we're running today's learning algorithms 1000x longer than we previously thought necessary.

    3.) algorithms are a little bit smarter. There are tricks like ReLu activation functions and dropout, but they probably aren't as important as good ol' minibatch gradient descent and a lot of patience.

    Yisong Yue give a great overview of Deep Learning on his Machine Learning blog and the commentary is full of feedback from some of the biggest names in the field. Here is the link http://yyue.blogspot.com/2015/01/a-brief-overview-of-deep-learning.html

    An intimate understanding of Deep Learning requires that you know the insides of classic ANNs and backprop like the back of your hand. It's still the best place to start. Karpathy's guides are good.

    ReplyDelete
    Replies
    1. Thanks, Ilya's post / Yisong's blog look interesting!

      Seems like it's mostly #3 that can be described as "doing something [slightly] different [or evolved]" -- i.e., #1 & #2 suggests the changes are solely in the "how we do it" category but that we're still in the same "what we do" category. Essentially, it's still a subfield of machine learning -- still just applying artificial neural networks -- but now on more data / with more/larger layers / thanks to faster computers. In other words, "deep learning" could perhaps be succinctly described as "modern techniques and practices revolving around the ways we currently happen to apply ANNs." Is this fair enough?

      Delete
    2. Hi MD,

      Sometimes old tricks with new computers do work out!

      I think it's fair to think of deep learning as "modern techniques and practices revolving around the ways we currently happen to apply ANNs." Even though it is sometimes branded as this new cutting-edge artificial intelligence technique, the intelligence is in the data. No big data == no deep learning.

      Delete
  8. Thank you for the article. For people like me trying to do more with computers in countries that are only developing this is important.

    ReplyDelete
  9. Deep Learning is a type of Machine Learning is a technique for Pattern Recognition.

    ReplyDelete
  10. Tomasz: Excellent summary, it will help a lot of people get grounded in the field. Of course there are numerous ways to organize the topics, but the differentiating aspects are well presented. Nice job!

    ReplyDelete
  11. Very inspiring!
    Thank you :)

    ReplyDelete
  12. Thanks for sharing such valuable information.
    laser cutting machine

    ReplyDelete