Wednesday, April 08, 2015

Deep Learning vs Probabilistic Graphical Models vs Logic

Today, let's take a look at three paradigms that have shaped the field of Artificial Intelligence in the last 50 years: Logic, Probabilistic Methods, and Deep Learning. The empirical, "data-driven", or big-data / deep-learning ideology triumphs today, but that wasn't always the case. Some of the earliest approaches to AI were based on Logic, and the transition from logic to data-driven methods has been heavily influenced by probabilistic thinking, something we will be investigating in this blog post.

Let's take a look back Logic and Probabilistic Graphical Models and make some predictions on where the field of AI and Machine Learning is likely to go in the near future. We will proceed in chronological order.

Image from Coursera's Probabilistic Graphical Models course

1. Logic and Algorithms (Common-sense "Thinking" Machines)


A lot of early work on Artificial Intelligence was concerned with Logic, Automated Theorem Proving, and manipulating symbols. It should not be a surprise that John McCarthy's seminal 1959 paper on AI had the title "Programs with common sense."

If we peek inside one of most popular AI textbooks, namely "Artificial Intelligence: A Modern Approach," we immediately notice that the beginning of the book is devoted to search, constraint satisfaction problems, first-order logic, and planning. The third edition's cover (pictured below) looks like a big chess board (because being good at chess used to be a sign of human intelligence), features a picture of Alan Turing (the father of computing theory) as well as a picture of Aristotle (one of the greatest classical philosophers which had quite a lot to say about intelligence).

The cover of AIMA, the canonical AI text for undergraduate CS students

Unfortunately, logic-based AI brushes the perception problem under the rug, and I've argued quite some time ago that understanding how perception works is really the key to unlocking the secrets of intelligence. Perception is one of those things which is easy for humans and immensely difficult for machines. (To read more see my 2011 blog post, Computer Vision is Artificial Intelligence). Logic is pure and traditional chess-playing bots are very algorithmic and search-y, but the real world is ugly, dirty, and ridden with uncertainty.

I think most contemporary AI researchers agree that Logic-based AI is dead. The kind of world where everything can be perfectly observed, a world with no measurement error, is not the world of robotics and big-data.  We live in the era of machine learning, and numerical techniques triumph over first-order logic.  As of 2015, I pity the fool who prefers Modus Ponens over Gradient Descent.

Logic is great for the classroom and I suspect that once enough perception problems become "essentially solved" that we will see a resurgence in Logic.  And while there will be plenty of open perception problems in the future, there will be scenarios where the community can stop worrying about perception and start revisiting these classical ideas. Perhaps in 2020.

Further reading: Logic and Artificial Intelligence from the Stanford Encyclopedia of Philosophy

2. Probability, Statistics, and Graphical Models ("Measuring" Machines)


Probabilistic methods in Artificial Intelligence came out of the need to deal with uncertainty. The middle part of the Artificial Intelligence a Modern Approach textbook is called "Uncertain Knowledge and Reasoning" and is a great introduction to these methods.  If you're picking up AIMA for the first time, I recommend you start with this section. And if you're a student starting out with AI, do yourself a favor and don't skimp on the math.

Intro to PDFs from Penn State's Probability Theory and Mathematical Statistics course

When most people think about probabilistic methods they think of counting.  In laymen's terms, it's fair to think of probabilistic methods as fancy counting methods.  Let's briefly take a look at what used to be the two competing methods for thinking probabilistically.

Frequentist methods are very empirical -- these methods are data-driven and make inferences purely from data.  Bayesian methods are more sophisticated and combine data-driven likelihoods with magical priors.  These priors often come from first principles or "intuitions" and the Bayesian approach is great for combining heuristics with data to make cleverer algorithms -- a nice mix of the rationalist and empiricist world views.

What is perhaps more exciting that then Frequentist vs. Bayesian flamewar is something known as Probabilistic Graphical Models.  This class of techniques comes from computer science, and even though Machine Learning is now a strong component of a CS and a Statistics degree, the true power of statistics only comes when it is married with computation.

Probabilistic Graphical Models are a marriage of Graph Theory with Probabilistic Methods and they were all the rage among Machine Learning researchers in the mid-2000s. Variational methods, Gibbs Sampling, and Belief Propagation were being pounded into the brains of CMU graduate students when I was in graduate school (2005-2011) and provided us with a superb mental framework for thinking about machine learning problems. I learned most of what I know about Graphical Models from Carlos Guestrin and Jonathan Huang. Carlos Guestrin is now the CEO of GraphLab, Inc (now known as Dato) which is a company that builds large-scale products for machine learning on graphs and Jonathan Huang is a senior research scientist at Google.

The video below is a high-level overview of GraphLab, but it serves a very nice overview of "graphical thinking" and how it fits into the modern data scientist's tool-belt. Carlos is an excellent lecturer and his presentation is less about the company's product and more about ways of thinking about next-generation machine learning systems.

A Computational Introduction to Probabilistic Graphical Models
by GraphLab, Inc CEO Prof. Carlos Guestrin (Video Link updated 4/17/2018)

If you think that deep learning is going to solve all of your machine learning problems, you should really take a look at the above video.  If you're building recommender systems, an analytics platform for healthcare data, designing a new trading algorithm, or building the next generation search engine, Graphical Models are the perfect place to start.

Further reading:
Belief Propagation Algorithm Wikipedia Page
An Introduction to Variational Methods for Graphical Models by Michael Jordan et al.
Michael Jordan's webpage (one of the titans of inference and graphical models)

3. Deep Learning and Machine Learning (Data-Driven Machines)

Machine Learning is about learning from examples and today's state-of-the-art recognition techniques require a lot of training data, a deep neural network, and patience. Deep Learning emphasizes the network architecture of today's most successful machine learning approaches.  These methods are based on "deep" multi-layer neural networks with many hidden layers. NOTE: I'd like to emphasize that using deep architectures (as of 2015) is not new.  Just check out the following "deep" architecture from 1998.

LeNet-5 Figure From Yann LeCun's seminal "Gradient-based learning
applied to document recognition" paper.

When you take a look at modern guide about LeNet, it comes with the following disclaimer:

"To run this example on a GPU, you need a good GPU. It needs at least 1GB of GPU RAM. More may be required if your monitor is connected to the GPU.

When the GPU is connected to the monitor, there is a limit of a few seconds for each GPU function call. This is needed as current GPUs can’t be used for the monitor while performing computations. Without this limit, the screen would freeze for too long and make it look as if the computer froze. This example hits this limit with medium-quality GPUs. When the GPU isn’t connected to a monitor, there is no time limit. You can lower the batch size to fix the timeout problem."

It really makes me wonder how Yann was able to get anything out of his deep model back in 1998. Perhaps it's not surprising that it took another decade for the rest of us to get the memo.

UPDATE: Yann pointed out (via a Facebook comment) that the ConvNet work dates back to 1989. "It had about 400K connections and took about 3 weeks to train on the USPS dataset (8000 training examples) on a SUN4 machine." -- LeCun



NOTE: At roughly the same time (~1998) two crazy guys in California were trying to cache the entire internet inside the computers in their garage (they started some funny-sounding company which starts with a G). I don't know how they did it, but I guess sometimes to win big you have to do things that don't scale. Eventually, the world will catch up.

Further reading:
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognitionProceedings of the IEEE, November 1998.

Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard and L. D. Jackel: Backpropagation Applied to Handwritten Zip Code Recognition, Neural Computation, 1(4):541-551, Winter 1989

Deep Learning code: Modern LeNet implementation in Theano and docs.


Conclusion

I don't see traditional first-order logic making a comeback anytime soon. And while there is a lot of hype behind deep learning, distributed systems and "graphical thinking" is likely to make a much more profound impact on data science than heavily optimized CNNs. There is no reason why deep learning can't be combined with a GraphLab-style architecture, and some of the new exciting machine learning work in the next decade is likely to be a marriage of these two philosophies.


You can also check out a relevant post from last month:
Deep Learning vs Machine Learning vs Pattern Recognition

Discuss on Hacker News

9 comments:

  1. Anonymous7:17 AM

    Apparently, you know nothing about logic-based AI. Logic has long advanced beyond what you describe ("modus ponens", wtf??) - there are probabilistic logics, statistical-relational learning, neuro-symbolic representation, stochastic logic, first-order graphical models and many other logic-based approaches which allow for statistical learning, dealing with partial observability, uncertainty and noise, etc. They have been successfully applied in various real-world areas such as biomedical research and large-scale social network analysis. I suggest you make yourself familiar with the state of the art in research before writing rubbish.

    ReplyDelete
    Replies
    1. Anonymous8:44 PM

      Well, What about writing a competing paper to explain the part that was forgotten instead of "Wft" people. Any point of view is welcome.

      Delete
  2. When I mention logic in the blog post I am really referring to First order logic.

    I remember studying skolemization in my AI class as an undergraduate. And in my 10 years of robotics research I haven't seen it used once.

    Probabilistic methods are broad and this blog post is actually defending those methods over black-box deep learning methods.

    We don't use the term logic the way you'd be happy with. We don't call probabilistic methods "probabilistic logics."

    Feel free to post links to some of your published papers and/or articles. And thanks for the insightful comment.

    ReplyDelete
    Replies
    1. > I remember studying skolemization in my AI class as an undergraduate. And in my 10 years of robotics research I haven't seen it used once.

      This is, for the most part, true. Logics are pretty still well used in higher forms of AI than just perception and control. Doug Lenat's work is still primarily logic based.

      Logic seems to be the win when you're trying to do higher order cognition. It's representation is far more efficient in space and time complexity than these deep methods.

      I think what you see is that robotics research has not been able to make the case that higher forms of cognition are needed for everyday robotics tasks.

      Delete
  3. I need to agree with Anonymous, here.

    Your post repeats a popular narrative -- one that's simplified to the point of being completely wrong. Logic programming isn't a casualty of the AI winter. Furthermore, first order logic is not representative of the variety of logical systems in common use -- it's merely the easiest one to teach to freshmen.

    Logic programming has clear benefits over statistical methods for particular sets of applications. After all, where statistical methods are completely opaque, the equivalent logic is typically human-readable. And, of course, logic programming continues to dominate in the tasks where it has always been dominant -- expert systems, because it makes no sense to try to get a model to learn patterns when a human being has already learned the patterns and can trivially codify them; theorem proving and constraint solving, because the starting point is formal and the inferences are formal.

    Probablistic methods are not probablistic logics. This seems to be a blind spot in your background -- a probablistic logic is a system that combines logic-style inference rules with fractional truth values, thus combining the expressive clarity of logic with the expressive completeness of probability. (A good example is PLN, which Goertzel has written on.)

    Certainly, statistical methods are easier to use when producing 'theory-free' models empirically. But, we rarely want to be truly 'theory-free' -- doing so means putting us at the mercy of obvious absurdities produced by biased samples. Logic programming has its place performing automatic sanity checks, and flagging situations to the developer when the statistical model strays outside the bounds of what some logical and human-readable model considers reasonable.

    Logic also has its place in didiacticism. After all, logic is readily comprehensible to students.

    Certainly, nobody could reasonably make the case that logic programming should fully displace statistical models! But, it's equally unreasonable to suggest that statistical models should be used when logic programming can do it better. And, to suggest that formal logic should be consigned to the past is to ignore the actual situation in industry and in academia in favor of a misunderstanding of history: statistical models did not rise in popularity because of a growing failure of logical models, but because logical models operate on a human scale while statistical models operate on an inhuman scale, and so statistical models gain popularity as the inhuman scale at which they must operate becomes more practical. The set of situations wherein statistical models are displacing formal models is the set of situations wherein formal models would have never been used, had resources not been a factor.

    ReplyDelete
  4. Anonymous2:10 AM

    Confusing article: AI is not classification only: it is just one intermediary part of the AI chain.

    We miss what is true core AI: knowledge formation, creation of thought processes, rooting of though processes in observed data, interconnection of thought processes over time to form more elaborate ones, along with their harnessing against observed data over time.

    If you want to start from something practical: create a board scene with items interacting between each other using arbitrary rules and try to write a program with minimal prior knowledge that can understand how the items interact between each other. Have fun!

    ReplyDelete
  5. Anonymous4:05 AM

    Considering you majored in Computer Vision, your argument is comprehensible.
    However, it is better to expand your definition about AI, not classification like mnist.

    I recommend reading 'The Master Algorithm' written by Pedro Domingos.
    It will give you bird eye view.

    Pedro Domingos is a leading machine learning researcher.
    He recently developed "Tractable Deep Learning" ( http://www.cs.washington.edu/node/8805 )
    His approach is combining first-order logic and probabilistic graphical models in a single representation.
    Have fun!

    ReplyDelete
  6. Anonymous9:56 AM

    In short, logics are all about proof, while probabilistic/statistical data processing tools/techniques (such as deep neural nets) is all about "exploratory maths" (data clustering, ...). But I really do not agree with the perspectives about logic: it is the calculus of computer science and as such, its foundations...

    ReplyDelete