Tombone's Computer Vision Blog: mathematics

Showing posts with label mathematics. Show all posts

Saturday, April 04, 2015

Three Fundamental Dimensions for Thinking About Machine Learning Systems

Today, let's set cutting-edge machine learning and computer vision techniques aside. You probably already know that computer vision (or "machine vision") is the branch of computer science / artificial intelligence concerned with recognizing objects like cars, faces, and hand gestures in images. And you also probably know that Machine Learning algorithms are used to drive state-of-the-art computer vision systems. But what's missing is a birds-eye view of how to think about designing new learning-based systems. So instead of focusing on today's trendiest machine learning techniques, let's go all the way back to day 1 and build ourselves a strong foundation for thinking about machine learning and computer vision systems.

Allow me to introduce three fundamental dimensions which you can follow to obtain computer vision masterdom. The first dimension is mathematical, the second is verbal, and the third is intuitive.

On a personal level, most of my daily computer vision activities directly map onto these dimensions. When I'm at a coffee shop, I prefer the mathematical - pen and paper are my weapons of choice. When it's time to get ideas out of my head, there's nothing like a solid founder-founder face-to-face meeting, an occasional MIT visit to brainstorm with my scientist colleagues, or simply rubberducking (rubber duck debugging) with developers. And when it comes to engineering, interacting with a live learning system can help develop the intuition necessary to make a system more powerful, more efficient, and ultimately much more robust.

Mathematical: Learn to love the linear classifier

At the core of machine learning is mathematics, so you shouldn't be surprised that I include mathematical as one of the three fundamental dimensions of thinking about computer vision.

The single most important concept in all of machine learning which you should master is the idea of the classifier. For some of you, classification is a well-understood problem; however, too many students prematurely jump into more complex algorithms line randomized decision forests and multi-layer neural networks, without first grokking the power of the linear classifier. Plenty of data scientists will agree that the linear classifier is the most fundamental machine learning algorithm. In fact, when Peter Norvig, Director of Research at Google, was asked "Which AI field has surpassed your expectations and surprised you the most?" in his 2010 interview, he answered with "machine learning by linear separators."

The illustration below depicts a linear classifier. In two dimensions, a linear classifier is a line which separates the positive examples from the negative examples. You should first master the 2D linear classifier, even though in most applications you'll need to explore a higher-dimensional feature space. My personal favorite learning algorithm is the linear support vector machine, or linear SVM. In a SVM, overly-confident data points do not influence the decision boundary. Or put in another way, learning with these confident points is like they aren't even there! This is a very useful property for large-scale learning problems where you can't fit all data into memory. You're going to want to master the linear SVM (and how it relates to Linear Discriminant Analysis, Linear Regression, and Logistic Regression) if you're going to pass one of my whiteboard data-science interviews.

Linear Support Vector Machine from the SVM Wikipedia page

An intimate understanding of the linear classifier is necessary to understand how deep learning systems work. The neurons inside a multi-layer neural network are little linear classifiers, and while the final decision boundary is non-linear, you should understand the underlying primitives very well. Loosely speaking, you can think of the linear classifier as a simple spring system and a more complex classifiers as a higher-order assembly of springs.

Also, there are going to be scenarios in your life as a data-scientist where a linear classifier should be the first machine learning algorithm you try. So don't be afraid to use some pen and paper, get into that hinge loss, and master the fundamentals.

Further reading: Google's Research Director talks about Machine Learning. Peter Norvig's Reddit AMA on YouTube from 2010.
Further reading: A demo for playing with linear classifiers in the browser. Linear classifier Javascript demo from Stanford's CS231n: Convolutional Neural Networks for Visual Recognition.
Further reading: My blog post: Deep Learning vs Machine Learning vs Pattern Recognition

Verbal: Talk about you vision (and join a community)

As you start acquiring knowledge of machine learning concepts, the best way forward is to speak up. Learn something, then teach a friend. As counterintuitive as it sounds, when it comes down to machine learning mastery, human-human interaction is key. This is why getting a ML-heavy Masters or PhD degree is ultimately the best bet for those adamant about becoming pioneers in the field. Daily conversations are necessary to strengthen your ideas. See Raphael's "The School of Athens" for a depiction of what I think of as the ideal learning environment. I'm sure half of those guys were thinking about computer vision.

The School of Athens by Raphael

An ideal ecosystem for collaboration and learning about computer vision

If you're not ready for a full-time graduate-level commitment to the field, consider a.) taking an advanced undergraduate course in vision/learning from your university, b.) a machine learning MOOC, or c.) taking part in a practical and application-focused online community/course focusing on computer vision.

During my 12-year academic stint, I made the observation that talking to your peers about computer vision and machine learning is more important that listening to teachers/supervisors/mentors. Of course, there's much value in having a great teacher, but don't be surprised if you get 100x more face-to-face time with your friends compared to student-teacher interactions. So if you take an online course like Coursera's Machine Learning MOOC, make sure to take it with friends. Pause the video and discuss. Go to dinner and discuss. Write some code and discuss. Rinse, lather, repeat.

Coursera's Machine Learning MOOC taught by Andrew Ng

Another great opportunity is to follow Adrian Rosebrock's pyimagesearch.com blog, where he focuses on python and computer vision applications.

Further reading: Old blog post: Why your vision lab needs a reading group

Homework assignment: First somebody on the street and teach them about machine learning.

Intuitive: Play with a real-time machine learning system

The third and final dimension is centered around intuition. Intuition is the ability to understand something immediately, without the need for conscious reasoning. The following guidelines are directed towards real-time object detection systems, but can also transfer over to other applications like learning-based attribution models for advertisements, high-frequency trading, as well as numerous tasks in robotics.

To gain some true insights about object detection, you should experience a real-time object detection system. There's something unique about seeing a machine learning system run in real-time, right in front of you. And when you get to control the input to the system, such as when using a webcam, you can learn a lot about how the algorithms work. For example, seeing the classification score go down as you occlude the object of interest, and seeing the detection box go away when the object goes out of view is fundamental to building intuition about what works and what elements of a system need to improve.

I see countless students tweaking an algorithm, applying it to a static large-scale dataset, and then waiting for the precision-recall curve to be generated. I understand that this is the hard and scientific way of doing things, but unless you've already spent a few years making friends with every pixel, you're unlikely to make a lasting contribution this way. And it's not very exciting -- you'll probably fall asleep at your desk.

Using a real-time feedback loop (see illustration below), you can learn about the patterns which are intrinsically difficult to classify, as well what environmental variations (lights, clutter, motion) affect your system the most. This is something which really cannot be done with a static dataset. So go ahead, mine some intuition and play.

Visual Debugging: Designing the vision.ai real-time gesture-based controller in Fall 2013

Visual feedback is where our work at vision.ai truly stands out. Take a look at the following video, where we show a live example of training and playing with a detector based on vision.ai's VMX object recognition system.

NOTE: There a handful of other image recognition systems out there which you can turn into real-time vision systems, but be warned that optimization for real-time applications requires some non-trivial software engineering experience. We've put a lot of care into our system so that the detection scores are analogous to a linear SVM scoring strategy. Making the output of a non-trivial learning algorithm backwards-compatible with a linear SVM isn't always easy, but in my opinion, well-worth the effort.

Extra Credit: See comments below for some free VMX by vision.ai beta software licenses so you can train some detectors using our visual feedback interface and gain your own machine vision intuition.

Conclusion

The three dimensions, namely mathematical, verbal, and intuitive provide different ways for advancing your knowledge of machine learning and computer vision systems. So remember to love the linear classifier, talk to your friends, and use a real-time feedback loop when designing your machine learning system.

Sunday, May 09, 2010

graph visualizations as sexy as fractals

I love to display mathematical phenomena -- often for me the proof is in the visualization. If you ever steal one of my personal research notebooks you'll see that the number of graphs I've been drawing over the years has been increasing at a steady rate. This is a habit I acquired from studying Probabilistic Graphical Models and the machine learning-heavy curriculum at CMU.

Back in high school I was amazed by the beauty of fractals based on Newton's method for finding roots, but as I've slowly been shifting my mode of thought from continuous optimization problems to discrete ones, automated graph visualization is as close as I've ever gotten to being an artist. Here is one such sexy graph visualization from Yifan Hu's gallery.

Andrianov/lpl1 via sfdp by Yifan Hu

I have been using Graphviz for about 8 years now, and I just can't get enough. I never thought it would produce anything as beautiful as this! I generally used graphviz to produce graphs like this:

Inspired by Yifan Hu and his amazing multilevel force directed algorithm for visualizing graphs I've started using sfdp for some of my own visualizations. sfdp is now inside graphviz, and can be used with the -K switch as follows (also with overlap=scale):

$ dot -Ksfdp -Tpdf memex.gv > memex.pdf

Inspired by Yifan Hu's coloring scheme based on edge length, I color the edges using a standard matlab jet colormap with shorter edges being red and longer ones being blue. To get the resulting lengths of edges, I actually run sfdp twice -- once to read off the vertex positions (this is what the graph drawing optimization produces), and once again to assign the edge colors based on those lengths. I could process the resulting postscript with one run like Yifan, but I don't want to figure out how to parse postscript files today. Here is an example using some of my own data.

Car Concept Visual Memex via sfdp by Tomasz Malisiewicz

This is a visualization of the car subset of the Visual Memex I use as an internal organization of visual concepts to be used for image understanding. If you click on this image, it will show you a significantly larger png.

As a sanity check, I also created a visualization of a standard UF Sparse Matrix (here is both mine and Yifan's result)

UTM1700b via sfdp by Yifan Hu

UTM1700b via sfdp by Tomasz Malisiewicz

As you can see, the graphs are pretty similar, modulo some coloring strategy differences -- but since the colors are somewhat arbitrary this is not an issue. If you click on these pictures you can see the PDFs which were generated via graphviz. Now only if my real-world computer vision graph were as structured as these toy problems then others could view me as both an artist and a scientist (like a true Renaissance man).

Thursday, March 18, 2010

Back to basics: Vision Science Transcends Mathematics

Vision (a.k.a. image understanding, image interpretation, perception, object recognition) is quite unlike some of the mathematical problems we were introduced to in our youth. In fact, thinking of vision as a "mathematical problem" in the traditional sense is questionable. An important characteristic of such "problems" is that by pointing them out we already have a notion of what it would be like to solve them. Does a child think of gravity as such a problem? Well, probably not, because without the necessary mathematical backbone there is no problem with gravity! It's just the way the world works! But once a child has been perverted by mathematics and introduced into the intellectual world of science, the world ceases to just be. The world become a massive equation.

Consider the seemingly elementary problem of finding the roots of a cubic polynomial. Many of us can recite the quadratic equation by heart, but not the one for cubics (try deriving the simpler quadratic formula by hand). If we were given one evening and a whole lot of blank paper, we could try tackling this problem (no Google allowed!). While the probability of failure is quite high (and arguably most of us would fail), it would still make sense of "coming closer to the solution". Maybe we could even solve the problem when some terms are missing, etc. The important thing here is that the notion of having reached a solution is well-defined. Also, once we've found the solution it would probably be easier to convince ourselves that it is correct (verification would be easier than actually coming up with the solution).

Vision is more like theoretical physics, psychology, and philosophy and less like the well-defined math problem I described above. When dealing with the math problem described above, we know what the symbols mean, we know valid operations -- the game is already set in place. In vision, just like physics, psychology and philosophy, the notion of a fundamental operational unit (which happens to be an object for vision) isn't rigidly defined as the Platonic Ideals used throughout mathematics. We know what a circle is, we know what a real-valued variable is, but what is a "car"? Consider your mental image of a car. Now remove a wheel and ask yourself, is this still a car? Surely! But what happens as we start removing more and more elements. At what point does this object cease to be a car and become a motor, a single tire, or a piece of metal? The circle, a Platonic Ideal, ceases to become a circle once it has suffered the most trivial of all perturbations -- any deviation from perfection, and boom! the circle ceases to be a circle.

Much of Computer Vision does not ask such metaphysical questions, as objects of the real world are seamlessly mapped to abstract symbols that our mathematically-inclined PhD students love to play with. I am sad to report that this naive mapping between objects of the real world and mathematical symbols isn't so much a questions of style, it is basically the foundation of modern computer vision research. So what must be done to expand this parochial field of Vision into a mature field? Wake up and stop coding! I think Vision needs a sort of a mental coup d'état, a fresh outlook on old problem. Sometimes to make progress we have start with a clean slate -- current visionaries do not possess the right tools for this challenging enterprise. Instead of throwing higher-level mathematics at the problem, maybe we are barking up the wrong tree? However, if mathematics is the only thing we are good at, then how are we to have a mature discussion which transcends mathematics? The window through which we peer circumscribes the world we see.

I believe if we are to make progress in this challenging endeavor, we must first become Renaissance men, a sort of Neitzschean Übermensch. We must understand what has been said about perception, space, time, and the structure of the universe. We must become better historians. We must study not only more mathematics, but more physics, more psychology, read more Aristotle and Kant, build better robots, engineer stabler software, become better sculptors and painters, become more articulate orators, establish better personal relationships, etc. Once we've mastered more domains of reality, and only then, will we have a better set of tools for coping with paradoxes inherent in artificial intelligence. Because a better grasp on reality -- inching closer to enlightenment -- will result in asking more meaningful questions.

I am optimistic. But the enterprise which I've outlined will require a new type of individual, one worthy of the name Renaissance Man. We aren't interested in toy problems here, nor cute solutions. If we want to make progress, we must shape our lives and outlooks around this very fact. Two steps backwards and three steps forward. Rinse, lather, repeat.

Wednesday, December 24, 2008

Newton's Method Fractal Yet Again

Yet another Newton's Method Fractal Animation. This one is created from the OpenGL C++ program I wrote some time ago on my Macbook Pro. I dumped the frames as PPMs, used ImageMagick to convert them to pngs (shell script one liner), FrameByFrame (A great free OS X product) to make a movie from frames, and iMovie to add music/titles. The song in the background is a New Deal cover of Journey's Separate Ways from 2003-02-21 (Check it out on archive.org).

In the future I plan on synchronizing the music with the fractals. Here is a cool screenshot from the movie when the background becomes white.

Friday, December 05, 2008

Using Computer Vision to Solve Jigsaw Puzzles

This past Thanksgiving I took a little bit of time to see if I could solve Jonathan Huang's Puzzle. While I haven't yet solved the task since I could only afford to put a couple of hours of work into it, here is a nice debug screenshot of the local puzzle piece alignment strategy I've been pursuing.

In this image I've shown puzzle piece A which is fixed and in red, and puzzle piece B as well as some likely transformations that when applied to B snap it to piece A. If I have more time over Xmas break and I get to finish the final puzzle -- I'll be sure to post the details.

Wednesday, April 23, 2008

newton's method fractal

Back in high school I was 'into' newton's method fractals. Some old images can be seen by clicking on the following image

When people make fractal videos (check them out on youtube), they are usually zooming into a fixed fractal. I have generated a fractal where the axis is fixed and the equation is changing. Check it out!